open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.28k stars 1.25k forks source link

How to Prepare data to train Spatial Temporal Action Detection SlowFast model? #652

Closed arvindchandel closed 3 years ago

arvindchandel commented 3 years ago

I have my video's for custom activity, and want to train Spatial Temporal Action Detection SlowFast model. I couldn't get clean instructions to follow to prepare data in required for mat to train the model. any link/pointer is appreciated.

kennymckormick commented 3 years ago

To create annotations like AVA ones, u simply need to annotate bounding boxes for the interested action at 1fps and also label the action class. You can refer to the desctription of the AVA Dataset (https://arxiv.org/abs/1705.08421). For the detailed format, u can refer to the format of AVA annotations downloaded with commands in https://github.com/open-mmlab/mmaction2/blob/master/tools/data/ava/download_annotations.sh

arvindchandel commented 3 years ago

@kennymckormick Thanks for reply. i am interested in the project and want to train temporal SlowFast model for custom activity recorded on my own data. Now to understand bit more: If i get it right, i need to create following files for my own data to train the model

  1. ann_file_train = f'{anno_root}/ava_train_v2.1.csv'
  2. ann_file_val = f'{anno_root}/ava_val_v2.1.csv'
  3. exclude_file_train = f'{anno_root}/ava_train_excluded_timestamps_v2.1.csv'
  4. exclude_file_val = f'{anno_root}/ava_val_excluded_timestamps_v2.1.csv'
  5. label_file = f'{anno_root}/ava_action_list_v2.1_for_activitynet_2018.pbtxt'
  6. proposal_file_train = (f'{anno_root}/ava_dense_proposals_train.FAIR.' 'recall_93.9.pkl')
  7. proposal_file_val = f'{anno_root}/ava_dense_proposals_val.FAIR.recall_93.9.pkl'

Now if break the problem into steps,

  1. Generate frames from videos at 1 fps.
  2. Annotate each frame with bounding box and assign action class.
  3. From the annotation data how i can produce the above 7 files like ava_train_v2.1.csv, ava_train_excluded_timestamps_v2.1.csv etc.?

If you can guide me clearly about these steps, would be great help for me. Thanks:)

arvindchandel commented 3 years ago

@kennymckormick if you can address my issue, I am stuck here.

kennymckormick commented 3 years ago
  1. You need to generate frames at a higher fps (24 or 30), but annotate them at 1 fps
  2. For ann_file_train & ann_file_val, you should check the annotations on ur own, each line in the ann file should be: video_id, second_id, bbox [length is 4], action class id, person id
  3. You don't need to generate exclude files.
  4. The label file stores the map from action class id to action class name.
  5. The proposal file is the detection result of ur dataset organized as a dictionary, the key is 'video_id, second_id', the value is human bounding boxes.

I think the best way to get familiar with the annotation format is to download them and check them on ur own.

arvindchandel commented 3 years ago

@kennymckormick in second point 'what is second_id' is it same as timestamp. And when we will use annotation at 1fps using any video annotation tool, will i get person_id too. I did not work on video annotation before thats why asking about person_id.

kennymckormick commented 3 years ago

@kennymckormick in second point 'what is second_id' is it same as timestamp. And when we will use annotation at 1fps using any video annotation tool, will i get person_id too. I did not work on video annotation before thats why asking about person_id.

The second_id is the same as timestamp (in seconds). Person_id the index of person in a frame: since AVA is a multi-label dataset, a person may have several actions at the same time (like standing, talking, looking at somebody, etc.). The person_id helps you to deal with the multi-label scenario.

arvindchandel commented 3 years ago

@kennymckormick Thanks for quick reply. this input is helpful to proceed further.

arvindchandel commented 3 years ago

@kennymckormick Hi kenny, Need some more inputs from you. After reading about Ava and from inputs provided by you, now i have clear idea about preparing AVA like dataset for new classes. I took 2 sample video for 2 activity and prepared 'train.csv', 'val.csv' after annotating video at 1 fps, then generated proposal_train.pkl, proposal_val.pkl as well. I have still few dots left to connect, on which i need your input:

  1. In config file we provide path of 'data/ava/rawframes' where 27K (approx) frames for generated from 15 min video file, the Naming style of these frames are 'img_00XXX.jpg' . Is there any link between timestamp column in train/val csv file to these frames in rawframes folder. Because in rawframes folder 27K frames are generated, means for 15 min video, 30 frames produced for each 1 sec.
  2. I hope you get my question about how training program maping the frames looking at the timestamp value in train/val csv file.
  3. Last confusion is, once my files are ready, what other entries (if needed) in config file along with 'num_classes' , should be changed to train model on my new classes.
arvindchandel commented 3 years ago

@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],.....

Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]),

So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.

kennymckormick commented 3 years ago

@kennymckormick Hi kenny, Need some more inputs from you. After reading about Ava and from inputs provided by you, now i have clear idea about preparing AVA like dataset for new classes. I took 2 sample video for 2 activity and prepared 'train.csv', 'val.csv' after annotating video at 1 fps, then generated proposal_train.pkl, proposal_val.pkl as well. I have still few dots left to connect, on which i need your input:

  1. In config file we provide path of 'data/ava/rawframes' where 27K (approx) frames for generated from 15 min video file, the Naming style of these frames are 'img_00XXX.jpg' . Is there any link between timestamp column in train/val csv file to these frames in rawframes folder. Because in rawframes folder 27K frames are generated, means for 15 min video, 30 frames produced for each 1 sec.
  2. I hope you get my question about how training program maping the frames looking at the timestamp value in train/val csv file.
  3. Last confusion is, once my files are ready, what other entries (if needed) in config file along with 'num_classes' , should be changed to train model on my new classes.
  1. In AVADataset, we set timestamp_start as 900, so that timestamp x in the csv file corresponds to frame idx: (x - timestamp_start) * 30, in which 30 is the fps we used to extract frames. For example, in AVA annotations, timestamp 902 corresponds to frame 'img_00060.jpg'.
  2. Besides setting the num_classes, you also need to create a label file and set the label file path in ur config.
kennymckormick commented 3 years ago

@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],.....

Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]),

So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.

I think you can set the confidence of ur proposal boxes as 1 (in AVADataset we filter out proposal boxes with low confidence given a threshold). To get the proposal boxes, for example, use can use mmdetection to predict human proposals one frame per second. Another quick solution is that just using the GT boxes as proposals.

jhonmajinerazo commented 2 years ago

@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],..... Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]), So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.

I think you can set the confidence of ur proposal boxes as 1 (in AVADataset we filter out proposal boxes with low confidence given a threshold). To get the proposal boxes, for example, use can use mmdetection to predict human proposals one frame per second. Another quick solution is that just using the GT boxes as proposals.

hello, did you find any solution to this problem? I also want to train slowfast on custom dataset based on AVA. but I can't find any information. how do i generate the .csv files?

kuangxiaoye commented 2 years ago

@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],..... Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]), So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.

I think you can set the confidence of ur proposal boxes as 1 (in AVADataset we filter out proposal boxes with low confidence given a threshold). To get the proposal boxes, for example, use can use mmdetection to predict human proposals one frame per second. Another quick solution is that just using the GT boxes as proposals.

hello, did you find any solution to this problem? I also want to train slowfast on custom dataset based on AVA. but I can't find any information. how do i generate the .csv files?

mmaction did not provide such a method when I was doing it. At that time, in order to do a test, I wrote a simple script based on my own data set to generate a csv file. So it is recommended that you first figure out the meaning of the content in the csv file, and then you can generate it yourself. It seems to be the coordinate action of the character. It seems that you have to figure out the coordinate system xyxy or xywh used here. This is the most difficult point. If you have mastered the meaning of the content, I think it is not difficult to generate a .csv file. .

mmaction在我做的时候并没有提供这样的方法,当时我为了做一个测试,根据自己的数据集写了一个简单的脚本生成了csv文件。所以建议你先搞清楚csv文件中的内容含义,然后才能自己生成。他这里面好像是人物的坐标动作,你似乎要搞清楚这里面使用的坐标系xyxy or xywh 这是最困难的一点,如果你已经掌握了里面的内容含义,我想生成.csv文件并不是难事。

jhonmajinerazo commented 2 years ago

@kuangxiaoye

hello, could you make available the code you created, it serves as a starting point for everyone. Thank you

kuangxiaoye commented 2 years ago

@kuangxiaoye

hello, could you make available the code you created, it serves as a starting point for everyone. Thank you

Im sorry jhonmajinerazo,code was deleted,but i can provide a tutorial about the csv file and other ava question.

https://blog.csdn.net/WhiffeYF/article/details/124358725?spm=1001.2014.3001.5502

It is a Chinese blog,It may be a little difficult to translate, but this covers almost all problems of ava from 0 to running.

Good luck.

tuanlda78202 commented 1 year ago

@kuangxiaoye hello, could you make available the code you created, it serves as a starting point for everyone. Thank you

Im sorry jhonmajinerazo,code was deleted,but i can provide a tutorial about the csv file and other ava question.

https://blog.csdn.net/WhiffeYF/article/details/124358725?spm=1001.2014.3001.5502

It is a Chinese blog,It may be a little difficult to translate, but this covers almost all problems of ava from 0 to running.

Good luck.

Thank you so much bro! It helps me so much