About DDP - Githubissues

Yan1026 commented 1 year ago

Hi!Sorry to bother you again.

I try to train the model,GPU memory usage as follow： ` +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |======================================================================|

| 5 N/A N/A 26598 C ...da/envs/ps-mt/bin/python3 2885MiB | | 5 N/A N/A 26599 C ...da/envs/ps-mt/bin/python3 1387MiB | | 5 N/A N/A 26600 C ...da/envs/ps-mt/bin/python3 1387MiB | | 5 N/A N/A 26601 C ...da/envs/ps-mt/bin/python3 1387MiB | | 6 N/A N/A 26599 C ...da/envs/ps-mt/bin/python3 2889MiB | | 7 N/A N/A 26600 C ...da/envs/ps-mt/bin/python3 2889MiB | | 8 N/A N/A 26601 C ...da/envs/ps-mt/bin/python3 2869MiB |

` I noticed that code used DistributedDataParallel.But why the first GPU(gpu 5) uses more GPU memory?How can I solve it?

yyliu01 commented 1 year ago

Hi @Yan1026 ,

Could you please double check that you are working with DDP instead of DP?

If that is the case, please use the "--ddp" flag when you train the model with multi-cards.

Cheers, Yuyuan

yyliu01 commented 1 year ago

I'm closing the issue but feel free to reopen it if my answer doesn't solve your issue.

yyliu01 / PS-MT

About DDP #20