zhihou7 / HOI-CL

Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration
https://sites.google.com/view/hoi-cl
MIT License
76 stars 11 forks source link

Questions on code (ATL) #10

Open anjugopinath opened 2 years ago

anjugopinath commented 2 years ago

Could you answer the below questions please?

  1. What do the following keywords in lib/ult/ult.py indicate?

i) Neg_select ii) pos_h_boxes iii) neg_h_boxes iv) pattern_type v) pattern_channel

  1. In which file do you mention the input path for the training dataset? In this case, 'HOI-CL/Data/hico_20160224_det/images/train2015/' ? I executed python tools/Train_ATL_HOCO.py

Thank You.

zhihou7 commented 2 years ago

Sorry for the confusing variable name.

Following iCAN, we augment the no_interaction samples. E.g., if the image includes a person, an apple and a desk, the Dataset has annotated <person, eat, apple>, but not annotated <person, no_interaction, desk>. Then, we augment the data via adding the pair <person, no_interaction, desk>, i.e., negative samples.

i) Neg_select: This is following the code of iCAN (https://arxiv.org/abs/1808.10437). Neg_select is related to Pos_augment. Pos_augment is the number of pos HOI samples (annotated samples), while Neg_augment is the number of negative HOI samples (augmented no-interaction samples, i.e. the no_interaction samples that do not exist in the annotation). augment means we augment the box via random crop. ii) pos_h_boxes: This is corresponding to pos_augment. iii) neg_h_boxes: This is the human boxes of negative interaction samples (augmented no_interaction samples) iv) pattern_type: This is useless. pattern_type is fix to 0. This is redundant code. v) pattern_channel: This is useless. In the released code, pattern_channel is fix to 2.

We use the path in utl.py, Test_HICO.py (test), tools.py (co-occurrence matrix). You can search "cfg.DATA_DIR" in these files.

Sorry for the confusing code. Feel free to ask questions if you have any questions.

anjugopinath commented 2 years ago

Hi,

Thank You for the quick response! I had some more questions. Thank You in advance.

1) I put a breakpoint inside this function. But, it wasn't hit. What is it used for? def Generate_action_HICO(actionlist): action = np.zeros(600) for GT_idx in actionlist: action[GTidx] = 1 action = action.reshape(1, 600) return action

If I need to train ATL on a new dataset, are the 2 .pkl files listed below the only 2 additional input files that are required apart from the images itself? Also, could you explain what the annotations are?

2) Trainval_Neg_HICO.pkl image

3) Trainval_GT_HICO.pkl image

Thank You.

zhihou7 commented 2 years ago

The two files are annotations.

Trainval_Neg_HICO.pkl is the annotation for negative samples (augment unlabeled no_interaction) Trainval_GT_HICO.pkl is the annotation for positive samples (annotated instances).

in Trainval_Neg_HICO.pkl, the key is the image id, the value is <image_id, HOI_category, human_box, object_box, ...> . We do not use the other number.

Trainval_GT_HICO.pkl is a list. each item represent an annotation: <image_id, HOI_category, human_box, object_box, ...> We do not use the other number.

anjugopinath commented 2 years ago

Hi,

In this image: image The person is interacting with the glass bottle and the pipe. There are other bottles, a grater, a scrubber, knife etc. Should I add 'no_interaction' annotations for every item for which there is no interaction? In that case, should I say no_interaction for bottle? Since, there is another bottle that is interacting.

zhihou7 commented 2 years ago

Yes, for the other bottle (no interaction and no annotation), we currently include this no_interaction in Trainval_Neg_HICO.pkl when we have the object boxes (bottle), that's negative samples.

anjugopinath commented 2 years ago

Hi,

Thank You for the reply. Can I train the model if I do not have Trainval_Neg annotations for the new dataset I am using?

zhihou7 commented 2 years ago

You can train the model. But you might suffer from imbalance/label missing problem. For affordance recognition, I find the effect of removing negative samples is limited.

anjugopinath commented 2 years ago

Hi, Thank You for the reply. Can you answer the below questions please? 1. image

  1. image

Can you explain the contents of the above two .pkl files? Specifically, the mapping, for ex- 0:0, 1:0 etc. Also, how are they related to the below 5 files?

  1. hico_list_obj.txt image

  2. hico_list_vb.txt image

  3. hico_list_hoi.txt image

  4. 24_verbs.txt image

  5. 21_verbs.txt image

  6. What is prior_mask.pkl used for?

  7. Is hoi_coco_list_num.txt required when training for ATL?

Thank You.

zhihou7 commented 2 years ago

hoi_to_obj.pkl and hoi_to_verb.pkl store the co-occurence matrix in HICO-DET, i.e. which object and verb are corresponding to a HOI. the name of the id in hoi_to_obj.pkl and hoi_to_verb.pkl are provides in hico_list_obj.txt, hico_list_vb.txt, hico_list_hoi.txt respectively. Noticeabley, the id in pkl files starts from 0 while the id in txt files starts from 1.

24_verbs.txt and 21_verbs.txt illustrate the name of verbs in V-COCO (HOI-COCO).

prior_mask.pkl is similar to hoi_to_obj.pkl, but is for V-COCO. prior_mask.pkl is providd in previous works.

hoi_coco_list_num.txt is not required for training ATL. hoi_coco_list_num.txt just demonstrates the long-tailed distribution.

anjugopinath commented 2 years ago

Thank You for your reply. Do I need to create 24_verbs.txt,21_verbs.txt and prior_mask.pkl for training ATL?

anjugopinath commented 2 years ago

1.What is the difference between self.num_classes = 600 and self.compose_num_classes = 600? in ResNet101_HICO.py inside class ResNet101()

  1. Are the weights in self.HO_weight (size 1 by 600) randomly initialized?

I saw this comment: "We copy from TIN. calculated by log(1/(n_c/sum(n_c)) c is the category and n_c is the number of positive samples." What is TIN?

zhihou7 commented 2 years ago

"num_classes" are annotated class number, while "self.compose_num_classes" is how many types of HOIs you want to compose (that can be larger than self.num_classes or smaller than self.num_classes).

TIN is Transferable Interactiveness knowledge for Human-Object Interaction Detection. The weights aim at balancing the data. it is a traditional re-balance strategy for imbalance data.