joslefaure / HIT

Official Implementation of our WACV2023 paper: “Holistic Interaction Transformer Network for Action Detection”
https://arxiv.org/abs/2210.12686
58 stars 10 forks source link

how to train AVA datasets #6

Closed yan-ctrl closed 1 year ago

yan-ctrl commented 1 year ago

Hello, can you tell me how to use your work to train AVA data sets? How to prepare data sets and how to prepare configuration files.

joslefaure commented 1 year ago

Hi Yan. I have updated the steps needed to prepare AVA. Alternatively, you can download the preprocessed AVA dataset provided here and then perform keypoints detection using detectron. AVA is very different from J-HMDB, and there are many things we need to modify from the code to make it work. I will soon update the repo with those changes.

yan-ctrl commented 1 year ago

Thank you for your update, but as you said, we seem unable to train AVA data sets or data sets similar to AVA at present. When can you update this part.

joslefaure commented 1 year ago

Sorry I haven't found the time to update this part. I think you can start with the materials I provided in DATA.md for AVA, and if there are any issues along the way I would help with them. I don't want to surcharge the repository, but I can email you the code to train/test AVA once you have prepared the data.

On Tue, Feb 7, 2023, 16:46 yan-ctrl @.***> wrote:

Thank you for your update, but as you said, we seem unable to train AVA data sets or data sets similar to AVA at present. When can you update this part.

— Reply to this email directly, view it on GitHub https://github.com/joslefaure/HIT/issues/6#issuecomment-1420399848, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBCBKXO5RZS736NVXJTWWIDUPANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.***>

yan-ctrl commented 1 year ago

Really? I would be grateful. The previous data is ready, but the last key point is not ready. Can you give me a link to the cloud disk or send me an email, 1263989216@qq.com 。

joslefaure commented 1 year ago

I will put the files on GDrive soon. TBH, getting the keypoints is troublesome and I forgot some of the steps. You can reverse-engineer it once you have the whole data, in case you need to perform the same operations on a new dataset

On Sun, Feb 12, 2023, 23:38 yan-ctrl @.***> wrote:

Really? I would be grateful. The previous data is ready, but the last key point is not ready. Can you give me a link to the cloud disk or send me an email, @.*** 。

— Reply to this email directly, view it on GitHub https://github.com/joslefaure/HIT/issues/6#issuecomment-1427063051, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBAFIJUSTWY57NYPHVDWXD7YZANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.***>

yan-ctrl commented 1 year ago

Hello, what are the difficulties in getting the key points? You don't mean to get the bone points in Detectron 2, and then put the acquired data into the folder. Ok, thank you very much. I look forward to putting your work into the network disk.

joslefaure commented 1 year ago

If you want to get the keypoints, you would need to do the following:

  1. Install detection 2
  2. Download the ava keyframes (I provided a link for that)
  3. Then write a script to run detectron on each keyframe and save the results (bounding boxes and keypoints) in a JSON file (refer to the JSON file of the object detections for AVA and the keypoints for JHMDB)

I will provide the keypoints JSON files (train and val) later today for AVA.

On Mon, Feb 13, 2023, 11:01 yan-ctrl @.***> wrote:

Hello, what are the difficulties in getting the key points? You don't mean to get the bone points in Detectron 2, and then put the acquired data into the folder. Ok, thank you very much. I look forward to putting your work into the network disk.

— Reply to this email directly, view it on GitHub https://github.com/joslefaure/HIT/issues/6#issuecomment-1427262936, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBEPVQRJT7FO2MTWBDTWXGPZPANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.***>

yan-ctrl commented 1 year ago

Ok, thank you very much. I like your work very much, because your work is not only space-time behavior detection, but also multimodal training: RGB+Pose.

joslefaure commented 1 year ago

Link to the AVA keypoints files

yan-ctrl commented 1 year ago

Thank you for your sharing. I follow your default configuration: hitnet.yaml trained a basic model, and then ran the evaluation ucf24 The jhmdb/pascalvoc.py script will compare the result of model reasoning with that of ground truth to calculate the frame map. The result is only 81.52, which is far from 83.8 in your work. Do you have any suggestions?

joslefaure commented 1 year ago

I have noticed this issue with the cleaned code. I will take some time to evaluate the difference between the messy version and the one uploaded here

On Tue, Feb 14, 2023, 13:55 yan-ctrl @.***> wrote:

Thank you for your sharing. I follow your default configuration: hitnet.yaml trained a basic model, and then ran the evaluation ucf24 The jhmdb/pascalvoc.py script will compare the result of model reasoning with that of ground truth to calculate the frame map. The result is only 81.52, which is far from 83.8 in your work. Do you have any suggestions?

— Reply to this email directly, view it on GitHub https://github.com/joslefaure/HIT/issues/6#issuecomment-1429161111, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBGDPPLOADH473QBYX3WXMM3PANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.***>

yan-ctrl commented 1 year ago

Thank you for your patience. The graphics card I use is A100. Is there any difference. Looking forward to your update

joslefaure commented 1 year ago

I doubt the difference is in the graphic cards since I experience the same issue. I will keep you updated

On Tue, Feb 14, 2023, 14:40 yan-ctrl @.***> wrote:

Thank you for your patience. The graphics card I use is A100. Is there any difference. Looking forward to your update

— Reply to this email directly, view it on GitHub https://github.com/joslefaure/HIT/issues/6#issuecomment-1429201353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBAKMLG2ZCJTGAQP3L3WXMSGFANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.***>

yan-ctrl commented 1 year ago

Ok, thank you for your conscientiousness and responsibility.

yan-ctrl commented 1 year ago

I doubt the difference is in the graphic cards since I experience the same issue. I will keep you updated On Tue, Feb 14, 2023, 14:40 yan-ctrl @.> wrote: Thank you for your patience. The graphics card I use is A100. Is there any difference. Looking forward to your update — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBAKMLG2ZCJTGAQP3L3WXMSGFANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.>

Hello, I want to ask you some questions about the recurrence of the JHMDB dataset:

1)Have you divided the test set and verification set into the same. At the end of the training, the test set is reasoned, and then the map value is calculated according to the real value

2)I find that the loss has been 0 when the iteration reaches 2000 times. Does it mean that the network itself is good enough and the accuracy of the training set is 100%. But why is the test set only about 80%.

3)The loss can be reduced to 0. Does it mean that the improvement of the network has no effect?Does it mean that improving the network has no effect? It seems to be true. I tried to add different attention modules after the 3 * 3 convolution of the basic residual block, and the effect did not improve, but decreased.Can you give me some suggestions? Whether training should be conducted in other data sets.

yan-ctrl commented 1 year ago

$2(IU@C~3PD0JITK26OJ{3B I see total Loss and loss pose_ Action, these two loss values have a value in brackets. Is it the real loss value in brackets? It still represents other meanings, because the values in brackets are falling, which is in line with the law of training.

joslefaure commented 1 year ago

Refer to the value inside the bracket. They are the ones corresponding to the real loss. The ones outside are the median loss

yan-ctrl commented 1 year ago

Hello, you shared HITNet with me last time files\AVA train kpts Detecton.json and AVA val kpts_ The detectron.json file, I want to ask you:

1) "Bbox": [13783520698547363, 85.04911804199219, 261.8096008300781, 350.46002197265625] is the form xyxy?

2) "Keypoints": [43.663475036621094, 176.14353942871094, 0.6509626507759094], [67.02825164794922, 151.8516845703125, 1.156382323114014]. It is a two-dimensional list. Is it the key point number of the human body when dim=0? What does dim=1 mean? Are the x, y, z coordinates of the key points? 3)3) "Image_id": In the form of 2961776, is it the video index of AVA plus the 1776 second frame of the video?

yan-ctrl commented 1 year ago

When viewing the JHMDB data set you shared, the real label: "ann_file": "jhmdb/annotations/jhmdb_train_gt_min. json", using the "keypoints_file" detected by detection2: "jhmdb/annotations/jhmdb_train_person_bbox_kpts. json", I found that the bbox of the same "image_id" is the same, such as "bbox": [98, 30, 198, 239], and their format is xywh? Or xyxy? Because your jhmdb.py DatasetEngine has a conversion process, I want to know clearly.

joslefaure commented 1 year ago

I just recently figured this could've been an issue since the jhmdb boxes are indeed in xyxy, I should not need to convert them again. But I noticed that they still remain xyxy after the conversion attempt.

Lines 177 and 185 should be boxes = BoxList(boxes_tensor, (im_w, im_h), mode="xyxy")

Thank you

On Sun, Feb 19, 2023, 00:20 yan-ctrl @.***> wrote:

When viewing the JHMDB data set you shared, the real label: "ann_file": "jhmdb/annotations/jhmdb_train_gt_min. json", using the "keypoints_file" detected by detection2: "jhmdb/annotations/jhmdb_train_person_bbox_kpts. json", I found that the bbox of the same "image_id" is the same, such as "bbox": [98, 30, 198, 239], and their format is xywh? Or xyxy? Because your jhmdb.py DatasetEngine has a conversion process, I want to know clearly.

— Reply to this email directly, view it on GitHub https://github.com/joslefaure/HIT/issues/6#issuecomment-1435709777, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBAIFKZYC7YAWH4WHJTWYDZGNANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.***>

yan-ctrl commented 1 year ago

Well, thank you very much for answering my doubts. As you said, Lines 177 and 185 should be

boxes = BoxList(boxes_tensor, (im_w, im_h), mode="xyxy")。 Sorry I didn't see your reply here, and then I went to open another question.

yan-ctrl commented 1 year ago

Hello, I open jhmdb.py to modify ava.py and want to use your work HITNet to train AVA dataset, but an error is reported:

File "/data/abd/bang/Action_Model/AlphAction_HIT/alphaction/dataset/transforms/object_transforms.py", line 22, in call

objects = objects.top_ k(self.top_k, boxes)

AttributeError: 'NoneType' object has no attribute 'top_ k',

I don't know why? I save keypoints_ The path of file is:

"keypoints_file": "/AVA/boxes/AVA_train_person_bboxkpts.json", Stored in the manner of AlphAction, function top K is added according to your work. Can you give me some suggestions? There is no problem with training the JHMDB data set you gave.

yan-ctrl commented 1 year ago

Sorry, it's my own problem. I forgot to add the path of key points: keypoints_ file=os.path.join(data_dir, attrs["keypoints_file"]),

hongminglin08 commented 1 year ago

I doubt the difference is in the graphic cards since I experience the same issue. I will keep you updated On Tue, Feb 14, 2023, 14:40 yan-ctrl @.**> wrote: Thank you for your patience. The graphics card I use is A100. Is there any difference. Looking forward to your update — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOMSBAKMLG2ZCJTGAQP3L3WXMSGFANCNFSM6AAAAAAUC3ZCUI . You are receiving this because you commented.Message ID: @.**>

Hello, I want to ask you some questions about the recurrence of the JHMDB dataset:

1)Have you divided the test set and verification set into the same. At the end of the training, the test set is reasoned, and then the map value is calculated according to the real value

2)I find that the loss has been 0 when the iteration reaches 2000 times. Does it mean that the network itself is good enough and the accuracy of the training set is 100%. But why is the test set only about 80%.

3)The loss can be reduced to 0. Does it mean that the improvement of the network has no effect?Does it mean that improving the network has no effect? It seems to be true. I tried to add different attention modules after the 3 * 3 convolution of the basic residual block, and the effect did not improve, but decreased.Can you give me some suggestions? Whether training should be conducted in other data sets.

Hello, have you solved the second question? I have the same problem, but I don't know how to improve the accuracy.

joslefaure commented 1 year ago

The accuracy has been improved (slightly better than the paper) and the training stabilized. I will commit the new version soon.

hongminglin08 commented 1 year ago

The accuracy has been improved (slightly better than the paper) and the training stabilized. I will commit the new version soon.

OK! Thank you very much

yan-ctrl commented 1 year ago

Hello, I want to ask you a question, I see in this script https://github.com/joslefaure/HIT/blob/master/hit/modeling/common_blocks.py The code in line 131 has the following criteria: if idx% nonlocal mod == nonlocal Mod - 1:, line 135 is: self.add_ Module (nl_block_name, nlmodule), but non-local Mod=1000, default.py also set CONV3_ NONLOCAL = False。 So your job is not to use non-local attention module by default?I hope to get your reply as soon as possible

joslefaure commented 1 year ago

This thread is going in all directions, closing