ywyue / FiT3D

[ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning
https://ywyue.github.io/FiT3D/
MIT License
239 stars 9 forks source link

Downstream task reproduce #7

Closed jo1jun closed 2 weeks ago

jo1jun commented 1 month ago

Hello, thank you for the excellent research and contributions.

I have some questions regarding downstream task training and evaluation.

Using Colab, I performed depth estimation on KITTI and NYU datasets with the DINOv2 pretrained ViT-S/14 and ViT-B/14 original models, as well as with the FiT3D fine-tuned models. I followed the repositories provided at DINOv2 and Monocular-Depth-Estimation-Toolbox.

I trained with the same configuration as in the paper for comparison.

Screenshot 2024-10-28 at 5 13 29 PM

While the RMSE values from the original model closely match those in the paper:

the FiT3D fine-tuned models show performance below what is reported:

Is there something I might have missed during reproduction? The weights appear to load correctly, and I used the Colab pre-load model section directly.


Additionally, could you share any details on how you configured the ADE20k and Pascal VOC downstream tasks or if there are plans to release downstream task training/evaluation code?

I am considering using the mmsegmentation repository, similar to how I used the MDE toolbox.

Thank you for your time and help!

ywyue commented 1 month ago

Hi @jo1jun, thanks for your interest in our work.

In the Colab demo and all the visualizations of feature maps and k-means clustering in the paper, we only use the fine-tuned features. However, for experiments on linear probing evaluation, we combine the fine-tuned features with the original features. As the 2D models were only fine-tuned on a small-scale indoor dataset ScanNet++, their generalization may degrade, which is one limitation of our work. We found that simply concatenating the original 2D features with fine-tuned features can preserve their generalization while incorporating 3D awareness. Related discussion can be found in:

We conducted the ADE20k and Pascal VOC segmentation tasks using the mmsegmentation library. For setup, please refer to DINOv2's config:

I am cleaning the linear probing evaluation code and will try to release it in this week.

jo1jun commented 2 weeks ago

I realized I missed the implementation details in the paper! Thank you for your kind and helpful response! I’m looking forward to the evaluation code. :)

ywyue commented 2 weeks ago

Hi @jo1jun, the linear probing evaluation code has been released. Sorry for the delay.