reproducing dinov2 backbone results

w1oves / Rein

[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>

https://zxwei.site/rein

GNU General Public License v3.0

203 stars 19 forks source link

reproducing dinov2 backbone results #48

Closed ysj9909 closed 1 month ago

ysj9909 commented 1 month ago

Hello! This is Amazing work!.

I am currently attempting to reproduce the results of the paper, but I am getting significantly different outcomes and would like to ask for your opinion on whether there might be an issue.

According to the paper, the frozen setting of the DINOv2 backbone on the GTA2CBM benchmarks yields a 61.1 mIoU. However, my implementation produces results around 63 (with a maximum of 63.44 mIoU). However, when training rein, the results were similar to those reported in the paper (64.3). For full fine-tuning, contrary to the paper, the performance was lower than the frozen setting, with a result of 61.5.

Could you help me identify what might be wrong, or if there is no issue at all?

w1oves commented 1 month ago

Could you provide training log? Generally speaking, Full is better than Frozen.

ysj9909 commented 1 month ago

Training log for frozen setting frozen.log

w1oves commented 1 month ago

My extensive experimentation has demonstrated that the fine-tuning of all parameters within the backbone architecture yields superior outcomes compared to the frozen all backbone parameters. Curiously, your reported findings from experiments conducted on a frozen backbone appear to be valid, with performance metrics exceeding full tuning. This discrepancy could potentially be attributed to variations in unstable model performance. If you're willing, I would welcome additional training logs, as this will help us gain a more accurate understanding of the impact of freezing backbone parameters. Thank you very much!

ysj9909 commented 1 month ago

Thank you for your kind response! In the same experimental setting, when I set the backbone learning rate smaller than the value suggested in the paper, I was able to achieve a higher mIoU than with rein. Therefore, while an adapter-based approach like rein seems promising, studying the weight space also appears important!

w1oves commented 1 month ago

Thank you for your sharing! This is a valuable discovery, and I will continue to explore the related experimental results.