Closed wintercat1994 closed 1 year ago
When you are using torch.nn.DistributedDataParallel
, the actual batch size is batch_size_per_gpu * num_gpus
.
When you are using
torch.nn.DistributedDataParallel
, the actual batch size isbatch_size_per_gpu * num_gpus
.
Thank you very much for your answer! However, I added a head for contrastive learning during model training.When I printed out the data shape of the input of the contrastive learning head when calculating loss. It was found that only the data on the single card could be used to calculate the loss for contractive learning. This can affect the performance of contrast learning. Contrast loss may be calculated before adding the gradient of the data from multiple cards. May I ask how I can calculate the data on all cards when calculating contrastive learning loss?
I think you could try all_gather
function to gather the features on all gpus
I think you could try
all_gather
function to gather the features on all gpus
Thank you! I will try it!
What is the feature?
Could you please tell me how to use torch.nn.DataParallel in mmpose single machine multi-card training to get bigger batchsize? I have observed that currently single-machine multi-card training can only use torch.nn.DistributedDataParallel, which makes the size of batchsize limited by single card video memory.
Any other context?
No response