mustafaaljadery / gemma-2B-10M

Gemma 2B with 10M context length using Infini-attention.
933 stars 57 forks source link

Some Errors... #4

Open Aniforka opened 4 months ago

Aniforka commented 4 months ago

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

drdsgvo commented 4 months ago

I got the same error with transformers 4.40.1

mindkrypted commented 4 months ago

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

Using a 3090, I left flash attention and was getting the same 'cache_position' error. As you tried, adding the *args and **kwargs results in the same error message.

Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM

drdsgvo commented 4 months ago

I can confirm all of the above: After fixing the parameters issues the error with the tensor size mismatch appeared. The parameters issues seem to be explainable by a change in the transformers API interface.

Aniforka commented 4 months ago

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

Using a 3090, I left flash attention and was getting the same 'cache_position' error. As you tried, adding the *args and **kwargs results in the same error message.

Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM

It feels like the code was either generated by a neural network, or it wasn't tested at all before uploading to github

web199195 commented 3 months ago

In fact, it cann't run。 A lot of errors. happened when run the code。 Parameters and. data dimension is not match.

mindkrypted commented 3 months ago

Might be a scam project to get some attention either for a grant or investors' money ... Have a look at another project where this guy is being targeted for using research and work from others: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/23 -- llama3-V project is stealing a lot of academic work from MiniCPM-Llama3-V 2.5 !

D-Pear commented 3 weeks ago

I think this model was originally trained and built through mlx(a ML framework for Apple silicon), and the code in pytorch was generated by LLM and didn’t get tested at all. I suppose it would be better if someone write a copy with pytorch instead.