gyxxyg / VTG-LLM

[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
https://arxiv.org/abs/2405.13382
Apache License 2.0
51 stars 1 forks source link

Inference time on Single A100 GPU #18

Closed anilbatra2185 closed 3 months ago

anilbatra2185 commented 3 months ago

Hi @gyxxyg ,

I am running an evaluation code for ActivityNet for 3.7K validation videos, however, the inference time is quite large. I wonder if you can share the inference time and any suggestions to speed-up the inference.

Regards Anil

gyxxyg commented 3 months ago

Hi, I tried about 3k activitynet captions videos on v100 gpu, the inference time is about 1 day. Moreover, processing 96 frames is time consuming, preprocess the visual features may speed up the inference.

anilbatra2185 commented 3 months ago

thanks for sharing the information.