Closed SouthFlame closed 9 months ago
Hi, thanks for your interest.
I did not understand your question well. Our method is designed for weakly supervised semantic segmentation and image-level labels (class names per image) have been provided in this setting. We only augment class names with prompts and synonyms (Section 3.2) as the text input of CLIP. This may be the initial text queries you mentioned?
Thanks for the answer! I am sorry to ask a misunderstood question, but, your answer let me understand it.
Thanks for your interesting work!!
I cannot get the construction details of the initial text queries for referring image segmentation.
If the detail has existed on the paper, I would be sorry to ask about it, and excuse me, please.
Best regards,
Namyup Kim.