As we can see the eq.(1), the object tag is produced by a argmax operation, while the paper shows "we select one O at random for each time" in Sec 3.1.2.
So there is a doubt: when the object tag is firstly determined, how to judge such a situation ? (" For a certain P, we may have various options for O because the block may contain multiple objects.")
Hello, thanks for your sharing the great work!
As we can see the eq.(1), the object tag is produced by a argmax operation, while the paper shows "we select one O at random for each time" in Sec 3.1.2. So there is a doubt: when the object tag is firstly determined, how to judge such a situation ? (" For a certain P, we may have various options for O because the block may contain multiple objects.")
Looking forward for your reply! Thanks😁!