ttengwang / PDVC

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
MIT License
200 stars 23 forks source link

A question about demo video #42

Closed adeljalalyousif closed 1 year ago

adeljalalyousif commented 1 year ago

Thanks for sharing your wonderful work. I haven't read your paper yet, so based on demo video, I have some questions: 1- Is your PDVC model can be considered as live video captioning? 2- Is the caption is generated for each event directly without reading all video frames? 3- How long does it take to generate caption for one event?

ttengwang commented 1 year ago

The model produces captions after reading all frames, so it is NOT a live captioning model. Inference time largely depends on the video duration and your hardware. A reference for you is that captioning the demo video (lasting 1 minute) takes around 36s on Nvidia T4 GPU by running test_and_visualize.sh.

adeljalalyousif commented 1 year ago

Thank you so much