kevin-ssy / CLIP_as_RNN

Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
89 stars 3 forks source link

Set of referring image segmentation queries #3

Closed SouthFlame closed 3 months ago

SouthFlame commented 5 months ago

Thanks for your interesting work!!

I cannot get the construction details of the initial text queries for referring image segmentation. From my understanding, open-vocab segmentation uses a set of input text queries and makes your recurrent filtering of non-existing concept texts necessary. However, since referring image segmentation uses a pair of an image and a text as input, I cannot understand how CaR eliminates the irrelevant texts recurrently. Therefore, my short knowledge can be filled by knowing the initial text queries for this task.

If the detail has existed on the paper, I would be sorry to ask about it, and excuse me, please.

Best regards,

Namyup Kim.

kevin-ssy commented 3 months ago

Hi Namyup,

Thank you for your kind interests! For referring segmentation we do not filter the irrelevant texts out so all results can be obtained in just one go. We have released all code at:

https://github.com/google-research/google-research/tree/master/clip_as_rnn

Please check it out!