w1oves / Rein

[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>
https://zxwei.site/rein
GNU General Public License v3.0
215 stars 19 forks source link

about Table 1. #7

Closed geyanqi closed 7 months ago

geyanqi commented 7 months ago

Awesome work but a little issue. Is it possible to release the code of "Frozen backbone of VFMs"? I'm very interested in how you can achieve such high segmentation performance with these functionally different backbones (SAM, MAE, and CLIP) by only training the decoder.

Best.

w1oves commented 7 months ago

Thank you very much for your attention! This is not a core contribution point in our work; adapting different VFMs is a simple yet laborious task. Due to my intense schedule over the past few days, I am currently unable to release code related to multiple frozen VFMs. However, I will endeavor to make the code regarding the frozen DINOv2 available as soon as possible. Your attention is greatly appreciated!

---- Replied Message ---- | From | @.> | | Date | 02/01/2024 12:35 | | To | @.> | | Cc | @.***> | | Subject | [w1oves/Rein] about Table 1. (Issue #7) |

Awesome work but a little issue. Is it possible to release the code of "Frozen backbone of VFMs" in the Table1? I'm very curious about how you can convert these frozen and different functional backbones through only training a decoder to achieve such high performance.

Best.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

geyanqi commented 7 months ago

Thanks for your response, but I'd really like to learn how to adapt SAM and CLIP. Hope you will consider posting when you have time.

w1oves commented 7 months ago

Regarding SAM and CLIP, we merely extract their backbones and then combine them with a trainable Mask2Former.

geyanqi commented 7 months ago

Yes, but in my previous attempts, I got poor results. So I really want to learn your code 😂.

w1oves commented 7 months ago

You might need to interpolate the position encoding.

geyanqi commented 7 months ago

i will try it

w1oves commented 7 months ago

i will try it

Hi! I've updated the code for training Frozen CLIP-L in configs/frozen_vfms.

geyanqi commented 7 months ago

thank you for your reply.

geyanqi commented 7 months ago

Hi, the "CLIPVisionTransformer" doesn't seem to be registered in backbone file.

w1oves commented 7 months ago

It should be okay now. Happy New Year!

Stark320 commented 4 months ago

Could you please release the code for training Frozen SAM? Thanks a lot!