ttgeng233 / UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
https://unav100.github.io
MIT License
52 stars 3 forks source link

Computation of i3d features #8

Closed 1980x closed 5 months ago

1980x commented 5 months ago

Hi. Thanks for awesome work.

I am not able to extract visual features in an efficient way. Its taking too much time even on 4 GPUs with 24GB RAM each. I am extracting visual features on 278 videos of 4-5 minutes duration by dividing into 16 parallel subsets simultaneously. It has extracted features for only 60 videos in 24 hours. Can you suggest an efficient way for the same?

How much time did it take for you to extract features of 10000 odd videos of 1 minute duration?

Thank you.