mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

training process: mat1 and mat2 must have the same dtype #21

Closed wang9danzuishuai closed 11 months ago

wang9danzuishuai commented 1 year ago

Hi, sorry to bother you again! I just followed the instructions in _train_videochatgpt.md, and use the command to start train. My devices are 2 RTX 8000, which is not Ampere GPU(a little bit out of date). And these devices don't support bf16 and tf32, so I set these two params as False. Then a Runtime Error occurred:

mat1 and mat2 must have the same dtype.

I'm sure we have used the exactly same environment as yours. So is this problem caused by GPU difference? If so, is there any way to solve this problem? We would be very appreciated if you could help us! Thank you!

mmaaz60 commented 12 months ago

Hi @wang9danzuishuai,

Thank you for your interest in our work and accept my apologies for the late reply as I was traveling. The issue is because you set the bf16 to False.

One of the possible solutions is to use FP16 mode by passing --fp16 True to the training command. Please let me know if it works for your GPU. Thank You.

wang9danzuishuai commented 12 months ago

Hi @wang9danzuishuai,

Thank you for your interest in our work and accept my apologies for the late reply as I was traveling. The issue is because you set the bf16 to False.

One of the possible solutions is to use FP16 mode by passing --fp16 True to the training command. Please let me know if it works for your GPU. Thank You.

Thank you for your reply. I used FP16 mode, but it reminded me that FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory. I think it's time for us to buy a new set of GPUs, LOL.