Closed wenjiajia123 closed 5 months ago
Hi @wenjiajia123! Thanks for reporting this problem. It is weird as it did not happen in our environment. Could you please double-check that the code is running on GPU instead of CPU? If the problem still exists, changing back to FP32 (by setting amp=None
at here) should work.
Thank a lot for your reply, it worked!
I just check the log file, and I only modify the amp parameter, it's a little weird. 😂
Is it possibly caused by slow I/O? You may compare the data_time
(time spent for data loading & pre-processing in seconds) at each step with our log.
this is my log file, it seems both time is much longer.
TBH, I currently have no idea about this. It seems more like an issue from the environment rather than the code (though you have confirmed that the dependencies are correct...). We'll see if there are other users mentioning this problem.
Again, please make sure the code is running on GPU. 😂
thanks for your reply, I wonder what's the total running time without using fp16? could it be the mian reason?🤔️
We've tested FP32 training and it costs about three and a half hours.
hi, I have fixed the problem through reinstall torch because my cuda version is 11.4. but the data time costs much longer, do you know how to solve it? 😊
This is only for the first few steps. It would be faster after the warmup.
thanks for your projects! when I reimplement this project, I occur the follow problem:
File "/mnt/bn/experience0313/gengwenjia/R2-Tuning/models/adapter.py", line 72, in forward v_emb = self.video_map(video_emb[i]) # B * T * P * C. RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
. And I checked all dependencies' version, do you know the possible reasons?