Open Aniforka opened 6 months ago
I got the same error with transformers 4.40.1
My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)
After I removed the mention of flash_atn in gemma.py, I got the following errors:
TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position'
(and with other models also)after adding *args and **kwargs to all forwards, another error appeared:
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3
Traceback (most recent call last): File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module> generated_text = generate( File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward outputs = self.model( File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward layer_outputs = decoder_layer( File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward _attended = self.self_attn( File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward attn_output = torch.nn.functional.scaled_dot_product_attention( RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3
All errors occurred after Loading checkpoint shards
Using a 3090, I left flash attention and was getting the same 'cache_position' error. As you tried, adding the *args and **kwargs results in the same error message.
Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM
I can confirm all of the above: After fixing the parameters issues the error with the tensor size mismatch appeared. The parameters issues seem to be explainable by a change in the transformers API interface.
My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)
After I removed the mention of flash_atn in gemma.py, I got the following errors:
TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position'
(and with other models also)after adding *args and **kwargs to all forwards, another error appeared:
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3
Traceback (most recent call last): File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module> generated_text = generate( File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward outputs = self.model( File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward layer_outputs = decoder_layer( File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward _attended = self.self_attn( File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward attn_output = torch.nn.functional.scaled_dot_product_attention( RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3
All errors occurred after Loading checkpoint shards
Using a 3090, I left flash attention and was getting the same 'cache_position' error. As you tried, adding the *args and **kwargs results in the same error message.
Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at
from .gemma import GemmaForCausalLM
It feels like the code was either generated by a neural network, or it wasn't tested at all before uploading to github
In fact, it cann't run。 A lot of errors. happened when run the code。 Parameters and. data dimension is not match.
Might be a scam project to get some attention either for a grant or investors' money ... Have a look at another project where this guy is being targeted for using research and work from others: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/23 -- llama3-V project is stealing a lot of academic work from MiniCPM-Llama3-V 2.5 !
I think this model was originally trained and built through mlx(a ML framework for Apple silicon), and the code in pytorch was generated by LLM and didn’t get tested at all. I suppose it would be better if someone write a copy with pytorch instead.
I add some codes with the help of Cursor. And it can run now but with a bad performance. Actually it can only generate meaningless texts. I do not know it is caused by fault code from AI or the infinitransformer code. I leave comments # where I changed.
class GemmaInfiniAttention(GemmaAttention):
def __init__(
self,
config: GemmaConfig,
layer_idx: Optional[int] = None,
):
super().__init__(config, layer_idx)
self.gate = nn.Parameter(torch.full((1, self.num_heads, 1, 1), -100.0))
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
memory: Optional[torch.Tensor] = None,
norm_term: Optional[torch.Tensor] = None,
no_memory_update: bool = False,
past_key_value: Optional[Cache] = None, # Add this line
output_attentions: Optional[bool] = False, # Add this line
use_cache: Optional[bool] = False, # Add this line
cache_position: Optional[torch.LongTensor] = None, # Add this line
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]:
class GemmaModel(GemmaPreTrainedModel):
def __init__(self, config: GemmaConfig):
super().__init__(config)
self.padding_idx = config.pad_token_id
self.vocab_size = config.vocab_size
self.embed_tokens = nn.Embedding(
config.vocab_size, config.hidden_size, self.padding_idx
)
self.layers = nn.ModuleList(
[
GemmaDecoderLayer(config, layer_idx)
for layer_idx in range(config.num_hidden_layers)
]
)
self.norm = GemmaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
self.gradient_checkpointing = False
self.post_init()
def get_input_embeddings(self):
return self.embed_tokens
def set_input_embeddings(self, value):
self.embed_tokens = value
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
memory: Optional[torch.Tensor] = None,
norm_term: Optional[torch.Tensor] = None,
no_memory_update: Optional[bool] = False,
cache_position: Optional[torch.LongTensor] = None, # Add this line
) -> Union[Tuple, InfiniBaseModelOutputWithPast]:
if (input_ids is None) ^ (inputs_embeds is not None):
raise ValueError("You must specify either input_ids or inputs_embeds")
if inputs_embeds is None:
inputs_embeds = self.embed_tokens(input_ids)
past_seen_tokens = 0
if use_cache and isinstance(past_key_values, StaticCache):
past_seen_tokens = past_key_values.get_seq_length()
cache_position = torch.arange(past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device)
position_ids = cache_position.unsqueeze(0) if position_ids is None else position_ids
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position, past_seen_tokens + inputs_embeds.shape[1])
hidden_states = inputs_embeds * torch.tensor(self.config.hidden_size**0.5, dtype=inputs_embeds.dtype)
all_hidden_states = () if output_hidden_states else None
all_self_attns = () if output_attentions else None
next_decoder_cache = None # Initialize next_decoder_cache
class GemmaInfiniAttention(GemmaAttention):
def __init__(
self,
config: GemmaConfig,
layer_idx: Optional[int] = None,
):
super().__init__(config, layer_idx)
self.gate = nn.Parameter(torch.full((1, self.num_heads, 1, 1), -100.0))
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
memory: Optional[torch.Tensor] = None,
norm_term: Optional[torch.Tensor] = None,
no_memory_update: bool = False,
past_key_value: Optional[Cache] = None, # Add this line
output_attentions: Optional[bool] = False, # Add this line
use_cache: Optional[bool] = False, # Add this line
cache_position: Optional[torch.LongTensor] = None, # Add this line
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]:
bsz, seq_len, _ = hidden_states.size()
query_states = self.q_proj(hidden_states)
key_states = self.k_proj(hidden_states)
value_states = self.v_proj(hidden_states)
query_states = query_states.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2)
key_states = key_states.view(bsz, seq_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
value_states = value_states.view(bsz, seq_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
# Add this line to repeat key and value states. Those lines can be removed, still keeping working
key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)
# Adjust attention_mask shape if necessary
if attention_mask is not None and attention_mask.shape[-1] != key_states.shape[-2]:
attention_mask = attention_mask[:, :, :, :key_states.shape[-2]]
# Debugging: Print shapes
print(f"query_states shape: {query_states.shape}")
print(f"key_states shape: {key_states.shape}")
print(f"value_states shape: {value_states.shape}")
if attention_mask is not None:
print(f"attention_mask shape: {attention_mask.shape}")
if no_memory_update:
memory_output = None
else:
memory_output = self._retrieve_from_memory(query_states, memory, norm_term)
if not no_memory_update:
updated_memory, updated_norm_term = self._update_memory(key_states, value_states, memory, norm_term)
memory = updated_memory.detach()
norm_term = updated_norm_term.detach()
attn_output = torch.nn.functional.scaled_dot_product_attention(
query_states,
key_states,
value_states,
attn_mask=attention_mask,
dropout_p=self.attention_dropout if self.training else 0.0,
)
if memory_output is None:
combined_output = attn_output
else:
combined_output = F.sigmoid(self.gate) * memory_output + (1 - F.sigmoid(self.gate)) * attn_output
combined_output = combined_output.transpose(1, 2).contiguous()
combined_output = combined_output.view(bsz, seq_len, self.hidden_size)
final_output = self.o_proj(combined_output)
if no_memory_update:
memory = None
norm_term = None
# Ensure the return statement provides five values
return final_output, None, None, memory, norm_term
Followings are my outcomes:
My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)
After I removed the mention of flash_atn in gemma.py, I got the following errors:
TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position'
(and with other models also)after adding *args and **kwargs to all forwards, another error appeared:
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3
All errors occurred after Loading checkpoint shards