Closed arvindchandel closed 3 years ago
To create annotations like AVA ones, u simply need to annotate bounding boxes for the interested action at 1fps and also label the action class. You can refer to the desctription of the AVA Dataset (https://arxiv.org/abs/1705.08421). For the detailed format, u can refer to the format of AVA annotations downloaded with commands in https://github.com/open-mmlab/mmaction2/blob/master/tools/data/ava/download_annotations.sh
@kennymckormick Thanks for reply. i am interested in the project and want to train temporal SlowFast model for custom activity recorded on my own data. Now to understand bit more: If i get it right, i need to create following files for my own data to train the model
Now if break the problem into steps,
If you can guide me clearly about these steps, would be great help for me. Thanks:)
@kennymckormick if you can address my issue, I am stuck here.
I think the best way to get familiar with the annotation format is to download them and check them on ur own.
@kennymckormick in second point 'what is second_id' is it same as timestamp. And when we will use annotation at 1fps using any video annotation tool, will i get person_id too. I did not work on video annotation before thats why asking about person_id.
@kennymckormick in second point 'what is second_id' is it same as timestamp. And when we will use annotation at 1fps using any video annotation tool, will i get person_id too. I did not work on video annotation before thats why asking about person_id.
The second_id is the same as timestamp (in seconds). Person_id the index of person in a frame: since AVA is a multi-label dataset, a person may have several actions at the same time (like standing, talking, looking at somebody, etc.). The person_id helps you to deal with the multi-label scenario.
@kennymckormick Thanks for quick reply. this input is helpful to proceed further.
@kennymckormick Hi kenny, Need some more inputs from you. After reading about Ava and from inputs provided by you, now i have clear idea about preparing AVA like dataset for new classes. I took 2 sample video for 2 activity and prepared 'train.csv', 'val.csv' after annotating video at 1 fps, then generated proposal_train.pkl, proposal_val.pkl as well. I have still few dots left to connect, on which i need your input:
@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],.....
Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]),
So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.
@kennymckormick Hi kenny, Need some more inputs from you. After reading about Ava and from inputs provided by you, now i have clear idea about preparing AVA like dataset for new classes. I took 2 sample video for 2 activity and prepared 'train.csv', 'val.csv' after annotating video at 1 fps, then generated proposal_train.pkl, proposal_val.pkl as well. I have still few dots left to connect, on which i need your input:
- In config file we provide path of 'data/ava/rawframes' where 27K (approx) frames for generated from 15 min video file, the Naming style of these frames are 'img_00XXX.jpg' . Is there any link between timestamp column in train/val csv file to these frames in rawframes folder. Because in rawframes folder 27K frames are generated, means for 15 min video, 30 frames produced for each 1 sec.
- I hope you get my question about how training program maping the frames looking at the timestamp value in train/val csv file.
- Last confusion is, once my files are ready, what other entries (if needed) in config file along with 'num_classes' , should be changed to train model on my new classes.
@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],.....
Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]),
So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.
I think you can set the confidence of ur proposal boxes as 1 (in AVADataset we filter out proposal boxes with low confidence given a threshold). To get the proposal boxes, for example, use can use mmdetection to predict human proposals one frame per second. Another quick solution is that just using the GT boxes as proposals.
@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],..... Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]), So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.
I think you can set the confidence of ur proposal boxes as 1 (in AVADataset we filter out proposal boxes with low confidence given a threshold). To get the proposal boxes, for example, use can use mmdetection to predict human proposals one frame per second. Another quick solution is that just using the GT boxes as proposals.
hello, did you find any solution to this problem? I also want to train slowfast on custom dataset based on AVA. but I can't find any information. how do i generate the .csv files?
@innerlee @kennymckormick sorry kanny for dragging in some old issues. Actually i prepared my data for custom activity then tried to train the model, in the result i observed that i could not overfit the model on small data then started debugging. One issue that i found is related to proposal file (pkl file). In my my case pkl file was dictionary having data like below: {('sweeping-2', 1): [[0.299, 0.191, 0.343, 0.308], [0.753, 0.133, 0.789, 0.292]],..... Actual ava proposal file format is like below: {'1j20qq1JyX4,0902': array([[0.036 , 0.098 , 0.55 , 0.979 , 0.995518],[0.443 , 0.04 , 0.99 , 0.989 , 0.977824]]), So in human boxes last entry is for confidence score probably? which i did not include. Is it mandatory to include? @kennymckormick you mention above that "proposal file is the detection result of your dataset" which i missed that time. Can u suggest any better tool for generating detection result on video at 1 FPS.
I think you can set the confidence of ur proposal boxes as 1 (in AVADataset we filter out proposal boxes with low confidence given a threshold). To get the proposal boxes, for example, use can use mmdetection to predict human proposals one frame per second. Another quick solution is that just using the GT boxes as proposals.
hello, did you find any solution to this problem? I also want to train slowfast on custom dataset based on AVA. but I can't find any information. how do i generate the .csv files?
mmaction did not provide such a method when I was doing it. At that time, in order to do a test, I wrote a simple script based on my own data set to generate a csv file. So it is recommended that you first figure out the meaning of the content in the csv file, and then you can generate it yourself. It seems to be the coordinate action of the character. It seems that you have to figure out the coordinate system xyxy or xywh used here. This is the most difficult point. If you have mastered the meaning of the content, I think it is not difficult to generate a .csv file. .
mmaction在我做的时候并没有提供这样的方法,当时我为了做一个测试,根据自己的数据集写了一个简单的脚本生成了csv文件。所以建议你先搞清楚csv文件中的内容含义,然后才能自己生成。他这里面好像是人物的坐标动作,你似乎要搞清楚这里面使用的坐标系xyxy or xywh 这是最困难的一点,如果你已经掌握了里面的内容含义,我想生成.csv文件并不是难事。
@kuangxiaoye
hello, could you make available the code you created, it serves as a starting point for everyone. Thank you
@kuangxiaoye
hello, could you make available the code you created, it serves as a starting point for everyone. Thank you
Im sorry jhonmajinerazo,code was deleted,but i can provide a tutorial about the csv file and other ava question.
https://blog.csdn.net/WhiffeYF/article/details/124358725?spm=1001.2014.3001.5502
It is a Chinese blog,It may be a little difficult to translate, but this covers almost all problems of ava from 0 to running.
Good luck.
@kuangxiaoye hello, could you make available the code you created, it serves as a starting point for everyone. Thank you
Im sorry jhonmajinerazo,code was deleted,but i can provide a tutorial about the csv file and other ava question.
https://blog.csdn.net/WhiffeYF/article/details/124358725?spm=1001.2014.3001.5502
It is a Chinese blog,It may be a little difficult to translate, but this covers almost all problems of ava from 0 to running.
Good luck.
Thank you so much bro! It helps me so much
I have my video's for custom activity, and want to train Spatial Temporal Action Detection SlowFast model. I couldn't get clean instructions to follow to prepare data in required for mat to train the model. any link/pointer is appreciated.