Open CurryKd7 opened 1 month ago
Hi,
Our bashscripts for pre-training and fine-tuning models naturally supports multi-gpu training.
But during the training process, I found through Nvidia smi that only 0 cards were used. I tried to modify gpu_id=int (os. environ. get ("SLURM.LOCAL ID", 2)), but the training would use 0 and 1 cards, but I still couldn't run all four cards at the same time. How can I solve this problem?
How to use multi card distributed training code