open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.04k stars 1.2k forks source link

[Docs] How to fine tune BMN when using ActionNet data prep method 2 #2788

Closed valentin-fngr closed 1 month ago

valentin-fngr commented 4 months ago

The doc issue

Hi, I am trying to fine tune BMN on my custom dataset. I know this has been mentioned already in the issues, but I could not use the previous issues posted here to help me find a way to solve my problem.

Data Prep

I have prepared my custom dataset following the activitynet data preparation methode 2.

At the end, I obtain this exact structure :

(if Option 2 used)
│   │   ├── anet_train_video.txt
│   │   ├── anet_val_video.txt
│   │   ├── anet_train_clip.txt
│   │   ├── anet_val_clip.txt
│   │   ├── activity_net.v1-3.min.json
│   │   ├── mmaction_feat
│   │   │   ├── v___c8enCfzqw.csv
│   │   │   ├── v___dXUJsj3yo.csv
│   │   │   ├── ..
│   │   ├── rawframes
│   │   │   ├── v___c8enCfzqw
│   │   │   │   ├── img_00000.jpg
│   │   │   │   ├── flow_x_00000.jpg
│   │   │   │   ├── flow_y_00000.jpg
│   │   │   │   ├── ..
│   │   │   ├── ..

For fine-tuning BMN :

which cannot work with RawframeDataset. How can I replicate that pipline with RawframeDataset ?

Please, provide any information that can help me successfully fine tune the model using Data preparation method number 2 for ActivityNet. This is very confusing and would love to propose an overall tutorial once I will be successful.

best,

Valentin

Suggest a potential alternative/fix

No response

Perceval-Wilhelm commented 3 months ago

Hello @valentin-fngr Can I ask you that how can you create your own custom dataset which has the same structure as ActivityNet because I am working on a Temporal Action Localization project but I cannot recreate my own custom data to have the same structure as ActivityNet. Thank you so much!

PopGreen69 commented 2 months ago

Hi @valentin-fngr,can you share the detail of data preparation? I got an issue when I extract the feature of my own dataset.

valentin-fngr commented 1 month ago

@sirrtt @PopGreen69 HI both. I gave up on that because it was way too complex to setup. I instead went for using a classic TSN recognition network with a sliding window pipeline. You can check there demo/long_video_demo.py where they demonstrate how to detect actions on a long video format. There, the setup is much easier.