Closed dneirfi closed 2 years ago
Hi, thanks for your interest in our work. We fine-tune the visual encoder of pre-trained CLIP on our tasks. I have also tried to freeze both the language and visual encoders, but the final performance is significantly lower than fine-tuning the visual backbone.
Thanks for answering!
So, you mean that you fine-tune the visual encoder of pre-trained CLIP and use it when test the model, right?
Yes
Hi. Thanks for sharing your work!
Does DenseCLIP use pre-trained CLIP encoder on inference setting?
I think pre-trained CLIP encoder needs to compute pixel-text score maps on inference setting. So the model is needed pre-trained CLIP encoder.
I wonder the CLIP encoder don't use on inference setting.
Thanks.