open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.31k stars 1.25k forks source link

[Feature] Improve support on temporal action detection #2600

Open stbnps opened 1 year ago

stbnps commented 1 year ago

What is the problem this feature will solve?

Temporal action detection is an important task in long-form video analysis. Its goal is to detect both the action start/end times, as well as the action label.

What is the feature?

Improve support for temporal action detection. Some (already implemented) models like VideoMAE V2 were tested on temporal action detection tasks, and it would be nice if we could use mmaction2 to replicate those results.

What alternatives have you considered?

No response

cir7 commented 1 year ago

Hi, @stbnps Thanks for your feedback. MMAction2 recently supported TCANet as a temporal action detection algorithm, do you have a more specific idea about the support plan? e.g. algorithms or related tools. As I know, VideoMAE V2 takes temporal action detection as a downstream task to evaluate the representation, i.e. replace I3D feature with the feature extracted by VideoMAE V2-g for ActionFormer. This is a bit different from supporting TAL tasks

stbnps commented 1 year ago

I haven't read the VideoMAE V2 paper in detail, but I noticed they state "We use them as the video foundation models and transfer them to three kinds of downstream tasks: action classification, action detection, and temporal action detection", and they specifically test it on "Temporal action detection ... Its goal is to recognize all action instances in an untrimmed video and localize their temporal extent".

Generally speaking, what I'd like is to be able to train more TAD models; models that compute start/end times and action labels. Some popular models I've seen, but aren't currently supported by MMAaction2, are TadTR, or ActionFormer.

mayfly227 commented 1 year ago

by the way, is there any demo about Temporal action detection for BMN or TCANet?,because i want to evaluation model performance by given a specific video such as "demo_skeleton.py" @cir7