ttengwang / PDVC

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
MIT License
200 stars 23 forks source link

"Running PDVC on Your Own Videos": Did i miss something? #26

Closed Jeiyoon closed 2 years ago

Jeiyoon commented 2 years ago

Hi

Thank you for your great work

I loaded your pretrained model and ran your code using my video dataset (SumMe: video summarization benchmark),

but the results are really weird.

most captions doesn't represent visual features

Capture Capture2 Capture4

https://user-images.githubusercontent.com/56618962/167997537-5b21d8bc-a9b2-4e97-b735-93dfe36189e3.mp4

Capture3

I just loaded your models and ran on the video datasets

most video captions are very weird

Did i miss something???

thank you

coranholmes commented 2 years ago

I have run the pre-trained model on CUHK Avenue Dataset, the captions also seem not as accurate as shown on the demo videos.

ttengwang commented 2 years ago

@Jeiyoon Hi, did you get similar captions for the given demo video?

Jeiyoon commented 2 years ago

@Jeiyoon Hi, did you get similar captions for the given demo video?

yeah It worked well on your demo video "xukun.mp4"

But all results on 50 vidieos (mp4 format) in the dataset didnt't represent video modality features

ttengwang commented 2 years ago

I guess the reason lies in the domain gap between the pretraining videos and testing videos. The model may have a limited generatlization ability.

Jeiyoon commented 2 years ago

yeah I agree

But really though, your paper and code are so nice 😊

Again thank you for your great work

ttengwang commented 2 years ago

😊😊