zhihou7 / HOI-CL

Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration
https://sites.google.com/view/hoi-cl
MIT License
78 stars 11 forks source link

set the co-occurrence matrix #25

Open rouge012 opened 1 year ago

rouge012 commented 1 year ago
    hi @rouge012,

the co-occurrence matrix $A\in R^{N_v \times N_o} $ is a two dimension matrix, where $N_v$ indicates the length of verb categories and $N_o$ indicate the length of object categories. We can initialize $A$ as a zero matrix. For each object, there are annotated verbs. We can set the corresponding position of the matrix $A$ as 1. For each example, if the apple is combinable with "eat", "cut" in the dataset, we set corresponding position of and in $A$ as 1.

Feel free to post if you have further questions

Regards,

Originally posted by @zhihou7 in https://github.com/zhihou7/HOI-CL/issues/4#issuecomment-1329058560

zhihou7 commented 1 year ago

https://github.com/zhihou7/HOI-CL/blob/master/misc/hoi_to_obj.pkl

https://github.com/zhihou7/HOI-CL/blob/master/misc/hoi_to_vb.pkl

The two files. By the way, it is from the dataset, not including all reasonable HOI categories.

Get Outlook for iOShttps://aka.ms/o0ukef


From: rouge012 @.> Sent: Tuesday, November 29, 2022 12:26:50 AM To: zhihou7/HOI-CL @.> Cc: Zhi Hou @.>; Mention @.> Subject: Re: [zhihou7/HOI-CL] hi @rouge012, (Issue #25)

Thanks for the quick and detailed clarification, and I am wondering where I can find the code for setting the co-occurrence matrix. Thank you!

— Reply to this email directly, view it on GitHubhttps://github.com/zhihou7/HOI-CL/issues/25#issuecomment-1329090626, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQGPLYBHGJILVUKTBDOVERTWKSXJVANCNFSM6AAAAAASNKSQMM. You are receiving this because you were mentioned.Message ID: @.***>

rouge012 commented 1 year ago

Hi! When I run the tools/Train_ATL_HICO.py. I got the below error: TypeError: head_to_tail_ho() takes 7 positional arguments but 9 were given

Please help.

zhihou7 commented 1 year ago

Hi @rouge012,

Thanks for your comments. It seems like because the released code base is a bit different from my local code in some functions. I have updated it and upload the new code. Feel free to ask if you have further questions.

Regards, Zhi Hou

rouge012 commented 1 year ago

Thank You for the quick response! I had a new error when I run the tools/Train_ATL_HICO.py. : ValueError: The passed save_path is not a valid checkpoint: ./Weights/res101_faster_rcnn_iter_1190000

Please help. Thank You in advance.

zhihou7 commented 1 year ago

Hi, You should download the pre-trained weights as instructed in https://github.com/zhihou7/HOI-CL/blob/85e15d367b188a53d4d8fc1d0abdd7a517926a8b/misc/download_dataset.sh#L89 into the directory "./Weights"

gdown 0B1_fAEgxdnvJR1N3c1FYRGo1S1U -O Weights/coco_900-1190k.tgz

and untar it.

Regards,

rouge012 commented 1 year ago

Why is the total loss nan when I run Train_ATL_HICO.py? I have downloaded the hico_20160224_det dataset. I used Python3.7,Tensorflow 1.14.0 and Cuda11.1 image

Please help. Thank You!

zhihou7 commented 1 year ago

Hi, does the loss start nan from the beginning or after thousands of iterations? I remember it is not nan in the beginning. Empirically, it is normal if it occasionally appears nan.

image
rouge012 commented 1 year ago

Thank You for the quick response! I found it is nan after two hundreds of iterations. I didn't download the V-COCO dataset, does it have anything to do with this?

zhihou7 commented 1 year ago

That's confusing. I use a similar environment to you.

cuda/10.0.130, python 3.7.2, tensorflow 1.14.1, V100 16Gb

According to your log, it seems like many errors during the optimization.

Regards,

Harzva commented 1 year ago

Hi, I'm very interested in co-occurrence matrices, can you elaborate on how he gets them, in fact how the infeasible interactions or combinations are culled, and is the culling strategy learned in class, following the model end to end training? Or do we get the co-occurrence matricest in advance to send to the network, and if so how do we get the co-occurrence matrices? Many thanks.

zhihou7 commented 1 year ago

Hi @Harzva, Thanks for your interest. We do not pre-define a co-occurrence matrix. Actually, we learn the co-occurrence matrix from the data. In each iteration during optimization, we can get the predictions of all the composite HOI features. We then use the predictions to update the concept confidence matrix according to the verb and object categories of the composite HOIs. Specifically, we update the concept matrix in a running mean manner, that we keep a matrix to stat the counts of the pairs and average the concept confidence with previous values in each iteration.

For self-compositional learning, we utilize the confidence matrix to build pseudo labels for the composite HOI features to avoid bias to known concepts. If we treat it as a Positive-unlabeled learning approach, self-compositional learning makes use of the unlabeled composite HOI features.

Feel free to contact me if you have further questions.

Harzva commented 1 year ago

Thank you for answering the above questions, I still have a few more to ask you, sorry for the inconvenience. Is the fabricator piece with gan mentioned in the paper as mlp, I see that gan is used in the code, is it that mlp can also have good results. Also how does the concept matrix come about?LCL, Lhoi and Lhoi sp are binary cross entropy losses.Are they all three binary classifications? Is it just a judgement of true or false? I think it's a good idea to add the fake features to minibatch and keep a balanced ratio of fakes to truths, but are these features prepared in advance by fake, or are they trained end-to-end, because as I understand it minibatch data is prepared in advance.

from paper"Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL.Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL. Then, we fix the pre-trained model and train the randomly initialized object fabricator via the loss function for the fabricator branch LCL. "Why not just joint training here? Is this stage of multiple is a significant effect improvement?

co-occurrence matrices This should consume computational resources, why not set it up a priori? It's just a good way to fix the feasibility matrix that the compostion is 0 or 1. For example, is it now possible to use like gpt2 or 3 to replace the computation of the feasibility judgement?

zhihou7 commented 1 year ago

Hi @Harzva, Thanks for your questions. We actually do not use adversarial training. We leverage the MLP to generate the object feature, and combined it with verb features to optimize the network jointly. I think this mainly balances the distribution. According to our observation, the quality of generated object features does matter.

In FCL, we directly use the label space to build the concept matrix, that is predefined, but missing a lot of reasonable concepts. Therefore, in the last paper, we introduce to discover the reasonable concepts.

L_{cl} L_hoi, L_hoi_sp are three binary losses because the labels are multi-hot. 117 dimension for verb categories.

For the optimization step, the multiple step strategy is just for the long-tailed HOI detection method. As I metioned in the paper, it is difficult to train the network to achieve a better result (it does mean one-step does not work). From my current point, it is quite tricky. Frankly speaking, I think it is because I was too naive at that moment. For the zero-shot HOI detection, we observe one-step is better.

You are right. Frankly speaking, I recently suffer from this question a lot. If we just want to achieve a good occurrence matrix, I think the large language model is a good way to complete the co-occurrence matrix. GPT is amazingly strong! I even doubt a lot of vision problems are meaningless after the GPT emerges. But mining the knowledge from pure visual data is also valuable for developing or understanding deep neural networks. From the perspective of learning (judge the perception ability of neural networks), I think it is valuable to complete the co-occurrence matrix from the visual data only since human beings do not infer the reasonable concepts from prior knowledge, but reason it by the object similarity or something like that.

Thanks for your questions. feel free to ask if you have further questions.

Harzva commented 1 year ago

Yes,as far as I know, there is no method to judge the feasibility of combinations using pure visual information in Compositional Zero-Shot Learning, and most methods borrow NLP techniques to determine feasibility. I think your work is also very meaningful and opens up another technical route.it‘s a great inspiration to me. However, you can also try to take advantage of NLP techniques in HOI, especially the latest and most effective ones such as the GPT series. If your technical route is based on pure visual information, once you incorporate multimodal information like CLIP or NLP techniques, you can use the best GPT models available.Or don't use it, just use the best one.

zhihou7 commented 1 year ago

Yes. Thanks for your comment. I think it is valuable to mine knowledge from pure visual information because the knowledge base of LLM is also largely from the visual world but extracted by human beings.