Closed Yan1026 closed 1 year ago
Hi @Yan1026 ,
Could you please double check that you are working with DDP instead of DP?
If that is the case, please use the "--ddp" flag when you train the model with multi-cards.
Cheers, Yuyuan
I'm closing the issue but feel free to reopen it if my answer doesn't solve your issue.
Hi!Sorry to bother you again.
I try to train the model,GPU memory usage as follow: ` +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |======================================================================|
| 5 N/A N/A 26598 C ...da/envs/ps-mt/bin/python3 2885MiB | | 5 N/A N/A 26599 C ...da/envs/ps-mt/bin/python3 1387MiB | | 5 N/A N/A 26600 C ...da/envs/ps-mt/bin/python3 1387MiB | | 5 N/A N/A 26601 C ...da/envs/ps-mt/bin/python3 1387MiB | | 6 N/A N/A 26599 C ...da/envs/ps-mt/bin/python3 2889MiB | | 7 N/A N/A 26600 C ...da/envs/ps-mt/bin/python3 2889MiB | | 8 N/A N/A 26601 C ...da/envs/ps-mt/bin/python3 2869MiB |
` I noticed that code used DistributedDataParallel.But why the first GPU(gpu 5) uses more GPU memory?How can I solve it?