leolee99 / PAU

The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS' 2023.
https://arxiv.org/abs/2309.17093
MIT License
21 stars 0 forks source link

The results about DiDeMo dataset. #4

Closed cty8998 closed 3 months ago

cty8998 commented 6 months ago

Hello, when I use your source code to train and evaluate on the DiDeMo dataset, the number of test/val/train samples are shown as

2024-02-21 08:16:14,149:INFO: ***** Running test *****
2024-02-21 08:16:14,149:INFO:   Num examples = 824
2024-02-21 08:16:14,149:INFO:   Batch size = 24
2024-02-21 08:16:14,149:INFO:   Num steps = 35
2024-02-21 08:16:14,149:INFO: ***** Running val *****
2024-02-21 08:16:14,149:INFO:   Num examples = 842
2024-02-21 08:16:37,722:INFO: ***** Running training *****
2024-02-21 08:16:37,723:INFO:   Num examples = 7103
2024-02-21 08:16:37,723:INFO:   Batch size = 48
2024-02-21 08:16:37,723:INFO:   Num steps = 2960
`

However, from paper "Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval (ICCV 2023)", the number of test/val/train samples should be 1004/1065/8392. I have also confirmed with the author of ProST. He used the test/val/train setting. Besides, I can not find the same results of CLIP4Clip-meanP and CLIP4Clip-seqTransf in your paper on the DiDeMo dataset. These results look better than those reported in other papers? Since I want to use the same baseline model with your paper, I must make sure the correct number of training and testing samples. Could you provide the log of corresponding dataset ?

If it is convenient, please also provide the training log of the MSVD dataset.

Looking forward to your timely reply.

leolee99 commented 6 months ago

Hi, thanks for your interests. The results of CLIP4Clip-meanP and CLIP4Clip-seqTransf used in our paper were reported in Table 4 of CLIP2TV.

As for the split of DiDeMo, we straightly utlized the preprocess of paper X-CLIP. They seem to do some extra processing compared to CLIP4clip.

The previous training log for MSVD was not reserved, and maybe I should take some time to re-train it. Thank you for your patience.

cty8998 commented 6 months ago

So can you provide me the detailed number of test/val/train samples on DiDeMo dataset ? Just like I shown above. This does not need training. Besides, the 'run_msvd.sh' file in your code has a para "-- init model", what should this para be ?

leolee99 commented 6 months ago

Hi, I test it again, and the following is the detailed number:

02/27/2024 20:24:05 - INFO -   ***** Running test *****
02/27/2024 20:24:05 - INFO -     Num examples = 824
02/27/2024 20:24:05 - INFO -     Batch size = 24
02/27/2024 20:24:05 - INFO -     Num steps = 35
02/27/2024 20:24:05 - INFO -   ***** Running val *****
02/27/2024 20:24:05 - INFO -     Num examples = 842
02/27/2024 20:24:09 - INFO -   ***** Running training *****
02/27/2024 20:24:09 - INFO -     Num examples = 7101
02/27/2024 20:24:09 - INFO -     Batch size = 24
02/27/2024 20:24:09 - INFO -     Num steps = 5900

Besides, the parameter "-- init model" is for model resuming, when you want to continue to train the previous model. If you want to train from scratch, you can remove this line in the script file.

cty8998 commented 6 months ago

OK , I got it, Thank you so much !