microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.61k stars 220 forks source link

The request of GPUs in TinyClip. #214

Closed Gumpest closed 5 months ago

Gumpest commented 5 months ago

I wonder if I utilize 8 A100 80GB to improve the batch size to 4*1024, can the result be reproduced? Thanks.

wkcn commented 5 months ago

Hi @Gumpest , thanks for your attention to our work!

The performance may drop slightly. The reason is that the global batch size affects the performance of the contrastive learning. In our experiment, the global batch size is 1024 * 32 gpus.

There are two solutions to support large batch size in limited resource.

  1. GradCache https://github.com/luyug/GradCache
  2. Accumulation https://github.com/mlfoundations/open_clip/blob/main/src/training/train.py#L114-L126

I recommend the latter. We may integrate it into TinyCLIP in the future.

Gumpest commented 5 months ago

Thanks @wkcn detailed reply!