Open mao1207 opened 8 months ago
Hi @mao1207 , have you managed to solve this probelm? I am having the same issue with 48GB memory cards.
Edit: No OOM with more than 2 cards.
hi, @mao1207 i have the same issue that shows error "CUDA out of memory" when run "model = model.to(device)" in "/anaconda3/envs/llava-med/lib/python3.10/site-packages/transformers/trainer.py -- class Trainer: def _move_model_to_device()". Have you solved this issue and how to solve it? thanks a lot !!!!!!!!
I encountered an out of memory issue during model training, even though I had already reduced the elements in the instruction following dataset to just one and removed the contents of conversations:
Surprisingly, I still ran out of memory when training with a 48GB memory card. I suspect there might be some infinite loop or other strange occurrence in the middle of the process. Can you shed some light on this?
The training command I used was: