mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
MIT License
299 stars 25 forks source link

Possibility to Change the Text Encoder? #27

Closed mcairlangga2 closed 9 months ago

mcairlangga2 commented 1 year ago

Dear Authors, Thank you for the great work.

I want to ask a question. Is it possible to change the text encoder to other models such as CLIP instead of using RoBERTa? Have you considered and tried another Text Encoder? If it's possible how to change it in the code?

Thank you!

mmaaz60 commented 1 year ago

Hi @mcairlangga2,

Yes, it is possible however we did not consider this research direction. With the recent advancements in NLP & Vision-Language modeling, it will be a worth exploring problem.

As per the implementation is concerned, it would be replacing RoberTa at this file with CLIP or other text decoder. Good Luck