Open Noblezhong opened 1 month ago
The problem is "CUDA out of memory". The A40 GPU has 48GB of memory. I use A100 which has 80GB memory. The reason may be that the model cannot suit 48GB.
Although you used 8 A40 GPUs, as I just reviewed my code, it seems I did not support multi-GPU inference to reduce memory usage since the memory already suits my setting. I guess you need to add multi-GPU support to the existing code, or find a GPU with more memory.
I will try to modify it to support multi-GPU tuning and inference. Thx for your answering!
hi!when I try to running your demo in PiA part, I get an error in 'instruction tuning' step:
Maybe it's a really stupid issue, but i am a freshman in LLM research field. :( My training device is a server with 8 A40 GPUS, I have modified the train.sh to decrease batch_size and increase the gradient_accumulation_steps, but it doesn't work.