mbanani / probe3d

[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models
MIT License
237 stars 11 forks source link

Layers to extract features from in DINOv2 #1

Closed harshm121 closed 4 months ago

harshm121 commented 4 months ago

Hi, Thanks for such an insightful work! Just a minor confusion, you use DINOv2 - vitl14's [6th, 12th, 18th and 24th] layers (index = [5, 11, 17, 23]) however DINOv2's official code base and paper (Sec 7.4) uses [5th, 12th, 18th and 24th] layers (index = [4, 11, 17, 23]) . Any idea if that may impact the performance significantly? Why would Meta use irregular layer numbers just for VIT/L?

mbanani commented 4 months ago

Hi @harshm121! Great eye for detail. I didn't notice this irregularity earlier. Most papers that use those methods use some regularity (eg, every n-th layer or block, etc). Maybe it provides some minor performance gains for that specific checkpoint, but I would be surprised if it's significant or if it's consistent across multiple checkpoints of the same model. I am curious about that choice though. Let me know if you ever run the experiment, and I might run it when I get sometime and report here on the performance impact.

harshm121 commented 4 months ago

Thanks! I too will report it here if I try it.