sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Apache License 2.0
106 stars 5 forks source link

Roadmap and Feedback #1

Open sming256 opened 3 months ago

sming256 commented 3 months ago

We keep this issue open to collect feature requests and feedback from users, and thus keep improving this codebase.

If you didn't find the features you need in the Road Map, please leave a message here.

Thank you!

rixejzvdl649 commented 3 months ago

cool

akshitac8 commented 2 months ago

Hi @sming256 Thank you for sharing the toolkit. Its really amazing, I wanted to the ask, the paper AdaTAD also has the parallel adapter approach, AdaTAD' (75.4 in paper) but when implementing it using the codebase shared I am able to achieve somewhere around 73.4 mAP, can you please share if there are any parameters are different between AdaTAD and AdaTAD'?

sming256 commented 2 months ago

Only a few hyperparameters are changed, such as the mlp ratio in the adapter is changed from 1/4 to 1/8, and the learning rate is searched between 1e-4 to 5e-5. I will update the parallel backbone and checkpoint later.

akshitac8 commented 2 months ago

That would be really helpful @sming256 for reproducing the results 😃.

akshitac8 commented 2 months ago

Hi @sming256 wanted to check if you could please upload the parallel backbone code as well that would be great.

Caspeerrr commented 1 month ago

Are you planning to also release tridetplus in this toolkit? (https://github.com/dingfengshi/tridetplus) thanks!

sming256 commented 1 month ago

@Caspeerrr Thanks for your suggestion! Integrating TriDetPlus into OpenTAD seems straightforward. However, TriDetPlus only released the VideoMAEv2 feature, not the DINO2 feature. This is the only reason we haven't integrated it now.

sming256 commented 1 month ago

Hi @akshitac8 , the side tuning model is released here, and we provide a training example here . When implementing the side-tuning with the latest OpenTAD, we find a performance drop of around 1% on THUMOS. In our released checkpoint, we achieve 74.65% mAP using VideoMAEv2-g.