modelscope / swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/
Apache License 2.0
2.59k stars 233 forks source link

CogVLM2 Video #1339

Open josephzpng opened 2 weeks ago

josephzpng commented 2 weeks ago

Describe the bug

File "/home/hadoop-vacv/.cache/huggingface/modules/transformers_modules/cogvlm2-video-llama3-chat/visual.py", line 78, in forward output = self.dense(out.view(B, L, -1)) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

截屏2024-07-09 16 57 03
Jintao-Huang commented 2 weeks ago

The error is in the model file. Try upgrading torch to see if it resolves the issue.

josephzpng commented 2 weeks ago

Thank you for your reply. The old problem has been solved, but a new problem has arisen.

截屏2024-07-10 15 47 24

I located the problem in the '_sample' function in transformers/generation/utils.py

截屏2024-07-10 15 50 12

But I don't know how to deal with it, 'llava-next-video-7b-instruct' uses transformers=4.2.0, which runs smoothly, but there is a problem with 'cogvlm2_video_13b_chat', which uses 4.1.0 as required. Looking forward to your reply!

Jintao-Huang commented 2 weeks ago

cogvlm2_video_13b_chat please use transformers==4.41.*

josephzpng commented 2 weeks ago

cogvlm2_video_13b_chat please use transformers==4.41.*

Yes, I am using 4.41.0, but it still doesn't work. The torch version is 2.3.0+cuda118