sail-sg / ptp

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
https://arxiv.org/abs/2212.09737
Apache License 2.0
148 stars 4 forks source link

About the obj tag and text prompt #6

Open Zysty opened 1 year ago

Zysty commented 1 year ago

Hello, thanks for your sharing the great work!

As we can see the eq.(1), the object tag is produced by a argmax operation, while the paper shows "we select one O at random for each time" in Sec 3.1.2. So there is a doubt: when the object tag is firstly determined, how to judge such a situation ? (" For a certain P, we may have various options for O because the block may contain multiple objects.")

Looking forward for your reply! Thanks😁!

FingerRec commented 1 year ago

Hi Zysty:

Thanks for your question and sorry for late reply.

  1. Yes, a grid may have multiple object tags. The case count a small part. Random select one object.