effective batch size with multiple GPUs

luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint

Apache License 2.0

327 stars 19 forks source link

Closed shaileshj2803 closed 2 years ago

shaileshj2803 commented 2 years ago

what is the effective batch size on which the contrastive loss is computed in case of multiple GPUs?

luyug commented 2 years ago

If representations vectors are gathered for each GPU from all GPUs, the effective batch size will be equal the total number examples on all GPUs.

shaileshj2803 commented 2 years ago

Thanks a lot