happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
415 stars 77 forks source link

question about input i3D features #106

Closed BigBuffa1o closed 1 year ago

BigBuffa1o commented 1 year ago

I have reproduced your codes, I am now trying to implement this amazing work on my custom datasets, notice the input should be i3D feature which means I have to use the work you mentioned to extract i3d features first, however the extractor is pertained on kinetic400 ,if I want to implement this work on medical action scene like Cholec80, can I use the extractor directly to get the correct i3d features for your work?Or I have to train another I3D model first? I know kinetic400 is big dataset including almost every action,but medical scene is still quite different, i am not sure if the extractor is general for different dataset.Anyone’s guide would be appreciated

tzzcl commented 1 year ago

I think video recognition models pre-trained on Kinetics is a good start for most cases, even some medical videos. However, I think video recognition models pre-trained on medical datasets will be a better feature extraction for your scene.

happyharrycn commented 1 year ago

One possibility is to finetune a Kinetics pre-trained backbone on your target dataset (using its training set). This can be done by considering a video classification problem. We have experimented doing so with medical videos, and the results are favorable.

BigBuffa1o commented 1 year ago

One possibility is to finetune a Kinetics pre-trained backbone on your target dataset (using its training set). This can be done by considering a video classification problem. We have experimented doing so with medical videos, and the results are favorable.

Yes fine-tune the pretrained model is the way,however in medical scene it means I need another data annotations work to annotate the data for medical action recognition problem first,it’s need many human resource,we currently only want to resolve this problem with only the TAL annotation we did for action former.That’s why I ask if i3d feature extractor is general,or at least the i3d I got won’t affect much of the training results of action former

happyharrycn commented 1 year ago

If you are thinking about directly using Kinetics pre-trained models (e.g., I3D) for medical videos, they will work to some extent, yet fine-tuning will always produce significantly better results. For fine-tuning, you won't need additional annotations beyond what is there for action localization. Clips can be sampled from the untrimmed videos for fine-tuning the video backbone, and they are already labeled (either as background or as one of the foreground action categories).

BigBuffa1o commented 1 year ago

that make sense to me,thank you for the patient explanation!