raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
505 stars 38 forks source link

Question about inference setting #17

Closed dneirfi closed 2 years ago

dneirfi commented 2 years ago

Hi. Thanks for sharing your work!

Does DenseCLIP use pre-trained CLIP encoder on inference setting?

I think pre-trained CLIP encoder needs to compute pixel-text score maps on inference setting. So the model is needed pre-trained CLIP encoder.

I wonder the CLIP encoder don't use on inference setting.

Thanks.

raoyongming commented 2 years ago

Hi, thanks for your interest in our work. We fine-tune the visual encoder of pre-trained CLIP on our tasks. I have also tried to freeze both the language and visual encoders, but the final performance is significantly lower than fine-tuning the visual backbone.

dneirfi commented 2 years ago

Thanks for answering!

So, you mean that you fine-tune the visual encoder of pre-trained CLIP and use it when test the model, right?

raoyongming commented 2 years ago

Yes