twke18 / CAST

MIT License
29 stars 0 forks source link

Multi card distributed operation #9

Open CurryKd7 opened 1 month ago

CurryKd7 commented 1 month ago

How to use multi card distributed training code

twke18 commented 1 month ago

Hi,

Our bashscripts for pre-training and fine-tuning models naturally supports multi-gpu training.

CurryKd7 commented 1 month ago

But during the training process, I found through Nvidia smi that only 0 cards were used. I tried to modify gpu_id=int (os. environ. get ("SLURM.LOCAL ID", 2)), but the training would use 0 and 1 cards, but I still couldn't run all four cards at the same time. How can I solve this problem?