ttengwang / PDVC

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
MIT License
200 stars 23 forks source link

Question about the result difference of video paragraph captioning #37

Closed wanghao14 closed 1 year ago

wanghao14 commented 1 year ago

Thanks for the great work! I notice that in the Table 4 of your paper, PDVC can achieve "B@4 11.80| M 15.93 | C 27.27" in ActivityNet Captions ae-val set, but it is "B@4 10.18 | M 15.96 | C 20.66" for PDVC with TSN features shown in the Readme. I wonder if the two datasets (ActivityNet Captions v.s. ActivityNet Entity) are different that leads to such different results? Looking forward to your reply.

wanghao14 commented 1 year ago

The two results should all be evaluated with learnt proposals.

ttengwang commented 1 year ago

Hi, actually I only report the results with learnt proposals in the readme.

wanghao14 commented 1 year ago

@ttengwang Yeah, I understood. Thanks for your reply. But I have one more question, I have run "PDVC with ground-truth proposals" on youcook2 using TSN features with following command:

config_path=cfgs/yc2_tsn_pdvc_gt.yml python train.py --cfg_path ${config_path} --criteria_for_best_ckpt pc --gpu_id 0

But the para_CIDEr is not very good: only 0.26 on validation set after training 30 epoches(Compare with 0.357 of MART). I want to know if this phenomenon is normal or if I have neglected some details.

ttengwang commented 1 year ago

Sorry I didn't record the results of this setting. You can check the METEOR score for dense video captioning. If the score is comparable with the paper, your para_CIDEr score is normal.

wanghao14 commented 1 year ago

It's doesn't matter. Your experimental results are very sufficient. The following result is the output validation on YouCook2 val set of epoch 28:

Validation results of iter 38657:
METEOR:0.1049158524914294
Recall:1.0
Precision:1.0
soda_c:0.12777677319530562
para_Bleu_1:0.3928343014791181
para_Bleu_2:0.22551595174128386
para_Bleu_3:0.12577259496428536
para_Bleu_4:0.07105039868345442
para_METEOR:0.15059172210180471
para_ROUGE_L:0.28671837234207104
para_CIDEr:0.2603262337631203

It seems that table2 only gives the results of dense captioning on Youcook2 with the predicted proposals, so could you please help me to see if it is normal with groundtruth proposals?

ttengwang commented 1 year ago

Yes, the above scores are normal.

wanghao14 commented 1 year ago

Thanks for your prompt reply! Hope your future research goes well.