Closed EchoQiHeng closed 2 months ago
Hi, thanks for your interest - for our experiment we found finetuning the encoder with a smaller LR is better than freezing it. Maybe it's because we don't have a lot of capacity in our decoder, and also the distribution aerial images might be quite far from SAM's pretraining data. That said, I think it's worth exploring more powerful decoders, or maybe techniques like LoRA. Both might allow you to keep the original SAM encoder untouched.
Thank you for sharing your code. I am curious as to why the encoder of SAM was not frozen. In my opinion, using frozen encoder layers would be more suitable for cross-domain segmentation tasks. Could this approach have compromised the performance of the Visual Foundation Models (VFMs)? I look forward to your response.