open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.29k stars 1.25k forks source link

Custom dataset making and training. #1796

Open K-YLMike opened 2 years ago

K-YLMike commented 2 years ago

Dear all, I am a college student doing my final year project. I want to train my custom dataset, although there are some tutorials for making dataset, I never success. I know there are some platforms can train slowfast, such as Pyslowfast/Detectron2/mmaction2... and know types of action dataset (kinetics/AVA/UCF...). Is there any kind person can give me some hint about making custom dataset (any kind of dataset is fine) and training? If you can, I will be very appreciate!

hukkai commented 2 years ago

Did you try mmaction2's tutorials for making dataset? If so, what error did you meet when you try that?

K-YLMike commented 2 years ago

Did you try mmaction2's tutorials for making dataset? If so, what error did you meet when you try that?

Thanks for replying. I think I cannot say it has errors. What I can say is tutorial on how to make custom dataset is messy and confusing. I hope it can give a real example of making custom dataset rather than just giving some codes and instructions (not completed). Or maybe it is my mistake and do not how to do it... I want to try mmactions2 much, since it makes me feel like everything is powerful. If you have experience on how to make UCF/AVA.Kinetics custom dataset, could you tell me what do I need to do in detail? Thanks a lot!

goncalofurtado1 commented 2 years ago

I am commenting because I also have a similar problem. Like @M1keStr0ng I am trying to use mmaction2 to infer on a custom dataset I will prepare.

My goal is to record rgb information (i can also have skeleton and depth information) from a scene where I'll be interacting with the environment. I would like to create a model to infer on that scene with custom simple labels: walking, standing, busy, giving something (or holding something in front of something like the sthv1/sthv2 labels) and wanting/receiving something. I have inferred on custom videos I made with several models and the ones that use sthv1/sthv2 are the closest but still not very accurate.

I am having trouble understanding how to create a model for my specific case, the instructions from the documentation are not very clear. It seems that it will require minimal work but I think there is more to it. Also the code is not commented so its hard to understand. I think this is because this toolbox is made assuming the person who will use it has some advanced knowledge and practice on this type of AI stuff (sorry for the lack of a better word). I do have some basic knowledge with AI but haven't made a project this complex so its hard to keep up.

Are there any more basic tutorials or documentation (even outside this toolbox) that you recommend to get a better understanding how to use this toolbox to my advantage? I just don't know what is the best step. Any guidance would be appreciated.

PS: If there is someone that has made a complete recognition pipeline from scratch that could share his/hers results with detail, images and videos I would appreciate very much.