wenwenyu / TCM

Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
Other
172 stars 14 forks source link

Where is the implementation of Meta Query in the codes? #17

Open X-funbean opened 5 months ago

X-funbean commented 5 months ago

I'm confused about the implementation of Language Prompt Module. According to Figure 4 and Sec. 3.2.3, a Meta Query is learned to genearate implicit conditional cue cc via Language Prompt Module. However, according to Fig. 5 and the codes below, it seems that the conditional cue cc is generated based on the global image feature, instead of Meta Query. These two findings appear mutually contradictory to me.

So my question is where can I find the implementation of Meta Query in the codes? By the way, what is the difference between the CoOp-like learnable prompts described in Sec. 3.2.2 and Meta Query?

https://github.com/wenwenyu/TCM/blob/cfa4756f4082d7d00e76161fb81dfa2c079d181c/ocrclip/ocrclip/ocrclip.py#L262-L275

image

image