zhihou7 / HOI-CL

Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration
https://sites.google.com/view/hoi-cl
MIT License
78 stars 11 forks source link

Hi, I have some question about Fabricated branch #2

Open bitwangdan opened 3 years ago

bitwangdan commented 3 years ago

In every step training, do you input all the obj embedding and vert into the .Fabricate model, and then filter out the wrong combination? Or do you only take one obj at a time?

zhihou7 commented 3 years ago

We first input all the obj embedding to combine with each verb, then remove infeasible HOIs in each step. We think this can balance the object distribution for each verb.

I suggest this implementation of FCL (https://github.com/zhihou7/FCL_VCOCO), which only contains FCL code and is thus clear

bitwangdan commented 3 years ago

thank you for your reply, Is the implementation method the same on the V-COCO and HICO data sets? I am not very familiar with tensorflow, and I have not found the relevant code, on the HICO data set

zhihou7 commented 3 years ago

Yes, the core part of FCL is similar. But on V-COCO, we do not use verb auxiliary loss since there are only 24 verbs. In this repository (https://github.com/zhihou7/FCL_VCOCO), we include code of FCL as a unique commit: https://github.com/zhihou7/FCL_VCOCO/commit/34ada5eee87389f0560125d403f58bfc96ce5aca.

Code of FCL on HICO data is https://github.com/zhihou7/HOI-CL/blob/master/lib/networks/Fabricator.py

bitwangdan commented 3 years ago

@zhihou7 hi,Can this model be run based on resnet50? In download_dataset.sh,there is only a download link for resnet50, but in the FCL project, the model is trained based on resnet101.

zhihou7 commented 3 years ago

the url (https://drive.google.com/file/d/0B1_fAEgxdnvJR1N3c1FYRGo1S1U/view) in download_dataset.sh is the weights for resnet101.

I also uncomment the line in Train_FCL_HICO.py with resnet50 weights right now. you can update the code. you can change the model name FCL_union_l2_zs_s0_vloss2_varl_gan_dax_rands_rew_aug5_x5new_res101 to FCL_union_l2_zs_s0_vloss2_varl_gan_dax_rands_rew_aug5_x5new_res50 for use resnet50 backbone. However, I do not test resnet50 in this repository.

regards

bitwangdan commented 3 years ago

@zhihou7 Sorry to bother you again, in Fabricator.py var_fabricate_gen_lite(), Why is this function(convert_emb_feats) called twice?Looking forward to your reply

zhihou7 commented 3 years ago

You are welcome. The first one should be comment (you can find I do not use the result returned by var_fabricate_gen_lite). in var_fabricate_gen, we call the convert_emb_feats twice (in fact, we also call convert_emb_feats for fine-tuning, then tripple)like this:

  var_fabricate_gen( verb, real_objects):

         fake_objects  = fabricate(real_objects)   # the same length to objects
         # one might also use concat(verb, verb), concat(real_objects, fake_objects) for fine-tuning the Fabricator.

         balanced_faked_objects = fabricate_all_objects(real_objects) # fabricate all objects for each verb
         # 1. cal loss for composite HOIs of balanced_faked_objects and corresponding verbs.
         # 2. use a loss (e.g. cosine) to regularize fake_objects and real_objects. This is how to draw the cosine similarity figure in Paper 
         return concat(verb, verb), concat(real_objects, fake_objects)

The implementation of var_fabricate_gen aims to regularize fake_objects and real objects (e.g. cosine loss, contrastive loss). We do not successfully fuse FCL and VCL in a single model, i.e. do not improve the result a lot. However, our preliminary experiment find with an additional loss can achieve better results (e.g. contrastive loss), especially when the batch size is bigger.

Hope this information helpful for you.

Regards,

bitwangdan commented 3 years ago

image Sorry to bother you again. Is your obj embedding randomly generated? Corresponding to "word2vec_list" in the code, the dimension is verbobj_num_class2048

zhihou7 commented 3 years ago

Yes. You can also use a word embedding to generate it. But, we find the result of randomly initialized embedding is slightly better and simple.