ttengwang / PDVC

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
MIT License
200 stars 23 forks source link

about "pred_event_count" #28

Closed coranholmes closed 2 years ago

coranholmes commented 2 years ago

Thank you for the great work!

I am trying to run the model on different videos. But the "pred_event_count" seems always be 3, is this just a coincidence or is there something I have done wrongly?

I am using the pretrained TSP features provided in the repo and the model work well on the demo video ("pred_event_count" is 3 as well).

ttengwang commented 2 years ago

Hi, it is normal for ActivityNet Captions, since this dataset has a large portion of videos containing 3 events. The model could predict more diverse event counts on YouCook2.

coranholmes commented 2 years ago

Thank you for the explanations.