Open chenhao2345 opened 1 year ago
I am having the same issue. Incidentally, did you have any such error with the fusion score? I was running it on MSVD
Text-to-Video:
>>> R@1: 30.4 - R@5: 59.7 - R@10: 70.7 - Median R: 3.0 - Mean R: 19.8
Video-to-Text:
>>> V2T$R@1: 33.1 - V2T$R@5: 60.1 - V2T$R@10: 72.6 - V2T$Median R: 3.0 - V2T$Mean R: 18.4
video_matrix sim matrix size: (27763, 670), (27763, 670)
titles_shot_matrix sim matrix size: (27763, 670), (27763, 670)
Traceback (most recent call last):
File "/local/Cap4Video/train_titles.py", line 723, in <module>
fusion_scores()
File "/local/Cap4Video/sim_matrix/fusion_scores.py", line 13, in fusion_scores
tv_video_metrics = compute_metrics(video_matrix)
File "/local/Cap4Video/metrics.py", line 13, in compute_metrics
ind = sx - d
ValueError: operands could not be broadcast together with shapes (27763,670) (670,1)
@BishmoyPaul I'm running it on MSRVTT. I have not seen any problems with the fusion score on MSRVTT.
Same problem here. @whwu95 I got Rank-1 47.7 in the first stage train_video.py And Rank-1 around 30 in the second stage train_titles.py.
BTW, do you know the purpose of fusion_scores? @chenhao2345
@JosephPai I got similar performance. ~47.5 in stage 1 and 30 in stage 2.
It seems to me that the authors get two similarity scores from stage 1 and stage 2, respectively. Then, they use fusion_scores to fuze the two similarity scores.
I got R@1 45.3 in stage and 29.6 in stage 2, it seems like that the code is to do global matching ?
I got R@1 45.3 in stage and 29.6 in stage 2, it seems like that the code is to do global matching ?
i think its true
Thanks for sharing your code. And how can I get the score 49 for R@1?
@chenhao2345 @JosephPai @ASENNIU @BishmoyPaul Hi, can I know your batch size setting and the number of gpus you are using for training stage 1 & stage 2?
@zef1611 did you find the batch size, number of gpus and gpu type used in this project? can anyone please answer this? @chenhao2345 @JosephPai @ASENNIU @BishmoyPaul
I am having the same issue. Incidentally, did you have any such error with the fusion score? I was running it on MSVD
Text-to-Video: >>> R@1: 30.4 - R@5: 59.7 - R@10: 70.7 - Median R: 3.0 - Mean R: 19.8 Video-to-Text: >>> V2T$R@1: 33.1 - V2T$R@5: 60.1 - V2T$R@10: 72.6 - V2T$Median R: 3.0 - V2T$Mean R: 18.4 video_matrix sim matrix size: (27763, 670), (27763, 670) titles_shot_matrix sim matrix size: (27763, 670), (27763, 670) Traceback (most recent call last): File "/local/Cap4Video/train_titles.py", line 723, in <module> fusion_scores() File "/local/Cap4Video/sim_matrix/fusion_scores.py", line 13, in fusion_scores tv_video_metrics = compute_metrics(video_matrix) File "/local/Cap4Video/metrics.py", line 13, in compute_metrics ind = sx - d ValueError: operands could not be broadcast together with shapes (27763,670) (670,1)
@BishmoyPaul How did you train on the MSVD dataset?
If you were using the co_train_msrvtt.sh script what did you give for --data_path
Could you share the training script
Thanks for sharing your code. Is it normal to get R1=30 with train_titles.py? After running the score fusion, the title matrix does not improve the video matrix.