Closed wang9danzuishuai closed 11 months ago
Hi @wang9danzuishuai,
Thank you for your interest in our work and accept my apologies for the late reply as I was traveling. The issue is because you set the bf16
to False
.
One of the possible solutions is to use FP16 mode by passing --fp16 True
to the training command. Please let me know if it works for your GPU. Thank You.
Hi @wang9danzuishuai,
Thank you for your interest in our work and accept my apologies for the late reply as I was traveling. The issue is because you set the
bf16
toFalse
.One of the possible solutions is to use FP16 mode by passing
--fp16 True
to the training command. Please let me know if it works for your GPU. Thank You.
Thank you for your reply. I used FP16 mode, but it reminded me that
FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.
I think it's time for us to buy a new set of GPUs, LOL.
Hi, sorry to bother you again! I just followed the instructions in _train_videochatgpt.md, and use the command to start train. My devices are 2 RTX 8000, which is not Ampere GPU(a little bit out of date). And these devices don't support bf16 and tf32, so I set these two params as False. Then a Runtime Error occurred:
mat1 and mat2 must have the same dtype.
I'm sure we have used the exactly same environment as yours. So is this problem caused by GPU difference? If so, is there any way to solve this problem? We would be very appreciated if you could help us! Thank you!