zhihou7 / HOI-CL

Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration
https://sites.google.com/view/hoi-cl
MIT License
78 stars 11 forks source link

Questions about dimensions of tensors #24

Open jonghakim35 opened 2 years ago

jonghakim35 commented 2 years ago

Hi, thanks for your great work.

Just for clarification, it would be great to know the dimension of tensors in Section 3.2. Below is what I've understood about the tensor dimension when using the HICO-DET dataset. If there's any misunderstanding, please kindly let me know.

\tilde{l}_o : (1, 80) A_o : (80, 600) l_v : (1, 117) A_v : (117, 600)

Therefore, \bar{y} : (1, 600). Is this correct?

And also, since the composed HOI label should be in the original 600 HOI triplet set, is it correct that discovering a novel HOI triplet is impossible using this method and the main focus of the work is correctly learning affordances via feature composition?

Again, thanks for sharing your great work.

zhihou7 commented 2 years ago

Hi @jonghakim35, For ATL and VCL, you are right because I predict 600 classes of HOI directly in the two papers, which will limit the label space. Empirically, we can convert 600 classes into verb labels via A_v and construct a new matrix with l_o for possible concepts, that is what I have done for affordance recognition in ATL. For HOI Concept Discovery, I supervise the model via the verb labels (the dimension is 117) and I use a matrix (80x117) to represent the HOI label, and I obtain the pseudo verb labels via the corresponding matrix.

jonghakim35 commented 2 years ago

Thanks for the quick and detailed clarification. It helped me a lot in understanding the paper!

zhihou7 commented 2 years ago

You are welcome. For ATL in which I find it is possible to recognize object affordance from an HOI model, I am not aware of the inner meaning of object affordance for HOI, which actually implies reasonable verb-object combinations.