Why FREEZE_ENCODER = Fasle ?

htcr / sam_road

Segment Anything Model for large-scale, vectorized road network extraction from aerial imagery. CVPRW 2024

https://arxiv.org/pdf/2403.16051.pdf

MIT License

150 stars 18 forks source link

Why FREEZE_ENCODER = Fasle ? #30

Closed EchoQiHeng closed 2 months ago

EchoQiHeng commented 2 months ago

Thank you for sharing your code. I am curious as to why the encoder of SAM was not frozen. In my opinion, using frozen encoder layers would be more suitable for cross-domain segmentation tasks. Could this approach have compromised the performance of the Visual Foundation Models (VFMs)? I look forward to your response.

htcr commented 2 months ago

Hi, thanks for your interest - for our experiment we found finetuning the encoder with a smaller LR is better than freezing it. Maybe it's because we don't have a lot of capacity in our decoder, and also the distribution aerial images might be quite far from SAM's pretraining data. That said, I think it's worth exploring more powerful decoders, or maybe techniques like LoRA. Both might allow you to keep the original SAM encoder untouched.