luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint
Apache License 2.0
327 stars 19 forks source link

effective batch size with multiple GPUs #9

Closed shaileshj2803 closed 2 years ago

shaileshj2803 commented 2 years ago

what is the effective batch size on which the contrastive loss is computed in case of multiple GPUs?

luyug commented 2 years ago

If representations vectors are gathered for each GPU from all GPUs, the effective batch size will be equal the total number examples on all GPUs.

shaileshj2803 commented 2 years ago

Thanks a lot