How much GPU memory can SigLIP save compared with CLIP?

Hello,

The original SigLIP paper said they can fit 2x batch size on TPU with base SigLIP model, compared with CLIP.

But in my experiment, I both used 14400 batch size on 48 A100-40GB, while the SigLIP and CLIP models are both base-sized standard structure. Then during the training, SigLIP takes 33.5G while CLIP takes 37.0G on each GPU. They are close and I couldn't scale up 2x batch size as the paper said.

I am not using any FSDP/deepspeed techniques, is it the reason? Or does the GPU type matter a lot? I have no idea.

Can anyone who ever trained a SigLIP model share your experience?

Thanks!

mlfoundations / open_clip

How much GPU memory can SigLIP save compared with CLIP? #825