mlfoundations / open_clip

An open source implementation of CLIP.
Other
9.29k stars 923 forks source link

How much GPU memory can SigLIP save compared with CLIP? #825

Closed Pluto-Jin closed 2 months ago

Pluto-Jin commented 5 months ago

Hello,

The original SigLIP paper said they can fit 2x batch size on TPU with base SigLIP model, compared with CLIP.

But in my experiment, I both used 14400 batch size on 48 A100-40GB, while the SigLIP and CLIP models are both base-sized standard structure. Then during the training, SigLIP takes 33.5G while CLIP takes 37.0G on each GPU. They are close and I couldn't scale up 2x batch size as the paper said.

I am not using any FSDP/deepspeed techniques, is it the reason? Or does the GPU type matter a lot? I have no idea.

Can anyone who ever trained a SigLIP model share your experience?

Thanks!