fp16 training problem - Githubissues

yeliudev / R2-Tuning

🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)

http://arxiv.org/abs/2404.00801

BSD 3-Clause "New" or "Revised" License

62 stars 1 forks source link

fp16 training problem #3

Closed wenjiajia123 closed 5 months ago

wenjiajia123 commented 5 months ago

thanks for your projects! when I reimplement this project, I occur the follow problem: File "/mnt/bn/experience0313/gengwenjia/R2-Tuning/models/adapter.py", line 72, in forward v_emb = self.video_map(video_emb[i]) # B * T * P * C. RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'. And I checked all dependencies' version, do you know the possible reasons?

yeliudev commented 5 months ago

Hi @wenjiajia123! Thanks for reporting this problem. It is weird as it did not happen in our environment. Could you please double-check that the code is running on GPU instead of CPU? If the problem still exists, changing back to FP32 (by setting amp=None at here) should work.

wenjiajia123 commented 5 months ago

Thank a lot for your reply, it worked!

wenjiajia123 commented 5 months ago

I just check the log file, and I only modify the amp parameter, it's a little weird. 😂

yeliudev commented 5 months ago

Is it possibly caused by slow I/O? You may compare the data_time (time spent for data loading & pre-processing in seconds) at each step with our log.

wenjiajia123 commented 5 months ago

截屏2024-05-16 19 58 53

this is my log file, it seems both time is much longer.

yeliudev commented 5 months ago

TBH, I currently have no idea about this. It seems more like an issue from the environment rather than the code (though you have confirmed that the dependencies are correct...). We'll see if there are other users mentioning this problem.

yeliudev commented 5 months ago

Again, please make sure the code is running on GPU. 😂

wenjiajia123 commented 5 months ago

thanks for your reply, I wonder what's the total running time without using fp16? could it be the mian reason?🤔️

yeliudev commented 5 months ago

We've tested FP32 training and it costs about three and a half hours.

wenjiajia123 commented 5 months ago

hi, I have fixed the problem through reinstall torch because my cuda version is 11.4. but the data time costs much longer, do you know how to solve it? 😊

截屏2024-05-21 15 50 21

yeliudev commented 5 months ago

This is only for the first few steps. It would be faster after the warmup.