Set of referring image segmentation queries

kevin-ssy / CLIP_as_RNN

Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

89 stars 3 forks source link

Thanks for your interesting work!!

I cannot get the construction details of the initial text queries for referring image segmentation. From my understanding, open-vocab segmentation uses a set of input text queries and makes your recurrent filtering of non-existing concept texts necessary. However, since referring image segmentation uses a pair of an image and a text as input, I cannot understand how CaR eliminates the irrelevant texts recurrently. Therefore, my short knowledge can be filled by knowing the initial text queries for this task.

If the detail has existed on the paper, I would be sorry to ask about it, and excuse me, please.

Best regards,

Namyup Kim.

kevin-ssy / CLIP_as_RNN

Set of referring image segmentation queries #3