mbanani / probe3d

[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models
MIT License
226 stars 7 forks source link

SigLIP forward pass question #9

Open stephanie-fu opened 3 weeks ago

stephanie-fu commented 3 weeks ago

Hi, thank you for the wonderful paper and codebase! I had one clarification question: it looks like there is an extra set of forward passes for the SigLIP ViT blocks - is this intentional for the siglip implementation?

https://github.com/mbanani/probe3d/blob/ea4c7be1f83928f6401045370d09b0914d354c6e/evals/models/siglip.py#L71-L82

mbanani commented 2 weeks ago

Thank you for catching this! This is definitely a bug. As far as I can tell, it was introduced while cleaning up the code and preparing for code and paper release. There were some inconsistencies between what I called things in different models and I made them consistent and it seems like I didn't delete the older block in SigLIP.

I was able to run quick experiments on a single gpu (with smaller batch sizes) to compare the performance of SigLIP (with and without the bug), CLIP, and DINO on NAVI for depth. SigLIP with the bug performed worse than CLIP on depth estimation, and with the bug fixed, performed better. This is consistent with the paper and suggests that the bug was introduced during the clean up. I'll see if I can get access to compute and run the full versions of the analysis.

Thanks again for catching this mistake (I will commit a fix shortly) and my apologies for any issues it may have caused.