zhihou7 / HOI-CL

Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration
https://sites.google.com/view/hoi-cl
MIT License
76 stars 11 forks source link

object label? #17

Open lumiaomiao opened 2 years ago

lumiaomiao commented 2 years ago

Hi, could you explain the in Table3 in ATL? You described it as " means we only use the boxes of the detection results", but how do you use the category of the detection results in training phrase and inference phrase ?

zhihou7 commented 2 years ago

Sorry for getting confusing you.

The object detection results provide both object category information and bounding boxes. Here, we only use the bounding boxes for inferring the HOI category. The training phase is the same as the previous setting. In fact, * means we use the same model as ATL, but do not use the object category information during inference.

feel free to contact me if you have further question,

Regards,

lumiaomiao commented 2 years ago

Thank you for your replay.

lumiaomiao commented 2 years ago

@zhihou7 Hi, I have another question about the code. The function get_new_Trainval_N in lib/ult/ult.py is definied as : image

Why use " Trainval_N[4]" not " Trainval_N[k]" ?

zhihou7 commented 2 years ago

Thanks for your comment. It should be Tranval_N[k]. It is a bug from the code of VCL. I forget to update the code. After fixing this bug, the performance will be improved a bit. This bug also does not add seen classes for zero-shot setting. Therefore, it just affects the performance a bit.

I have updated the code.

Thanks.

lumiaomiao commented 2 years ago

Thank you for your quick reply.

lumiaomiao commented 2 years ago

@zhihou7 As following codes, if an image contains two pairs <h1, v1, o1>, <h1, v2, o1> , and the first one is in the unseen composition list, then you delete two pair from training data. Why don't you only delete the first one ? In my view, only deleting the first one is more close to your description in paper. image

zhihou7 commented 2 years ago

Here, GT[1] is HOI label list of a HOI sample, e.g., [eat apple, hold apple]. If "eat apple" is unseen category. I think it is fair to remove this HOI sample, rather than remove the annotation [eat apple]. Otherwise, the sample of "eat apple" is still existing, but is not labeled, which I think is different from the setting of zero-shot.

lumiaomiao commented 2 years ago

I get it, thank you.