Open jonghakim35 opened 2 years ago
Hi @jonghakim35, For ATL and VCL, you are right because I predict 600 classes of HOI directly in the two papers, which will limit the label space. Empirically, we can convert 600 classes into verb labels via A_v and construct a new matrix with l_o for possible concepts, that is what I have done for affordance recognition in ATL. For HOI Concept Discovery, I supervise the model via the verb labels (the dimension is 117) and I use a matrix (80x117) to represent the HOI label, and I obtain the pseudo verb labels via the corresponding matrix.
Thanks for the quick and detailed clarification. It helped me a lot in understanding the paper!
You are welcome. For ATL in which I find it is possible to recognize object affordance from an HOI model, I am not aware of the inner meaning of object affordance for HOI, which actually implies reasonable verb-object combinations.
Hi, thanks for your great work.
Just for clarification, it would be great to know the dimension of tensors in Section 3.2. Below is what I've understood about the tensor dimension when using the HICO-DET dataset. If there's any misunderstanding, please kindly let me know.
\tilde{l}_o : (1, 80) A_o : (80, 600) l_v : (1, 117) A_v : (117, 600)
Therefore, \bar{y} : (1, 600). Is this correct?
And also, since the composed HOI label should be in the original 600 HOI triplet set, is it correct that discovering a novel HOI triplet is impossible using this method and the main focus of the work is correctly learning affordances via feature composition?
Again, thanks for sharing your great work.