Hi, I am consistently finding a difference of -0.04% in the reported performance on ActivityNet-QA dataset when using the official code for evaluation (https://github.com/MILVLG/activitynet-qa)
Replicating the results on ActivityNet-QA:
Accuracy using Singularity-Temporal (n=12 frames, num_temporal_layers=2, ckpt: ft_anet_qa_singularity_temporal_17m.pth): 44.01%
Accuracy using ActivityNet-QA: 43.97%
Bonus: ActivityNet-QA evaluation code provides evaluation of each question sub-type :)
Accuracy (per question type):
Motion: 32.2500%
Spatial Relation: 22.6250%
Temporal Relation: 4.1250%
Free: 75.7523%
All: 43.9750%
Accuracy of the Free type questions(per answer type):
Yes/No: 75.1194%
Color: 51.3630%
Object: 27.6730%
Location: 39.8964%
Number: 54.4554%
Other: 36.2241%
P.S.: The difference of -0.04% is consistent for all my experiments on ActivityNet-QA.
Hi, I am consistently finding a difference of -0.04% in the reported performance on ActivityNet-QA dataset when using the official code for evaluation (https://github.com/MILVLG/activitynet-qa) Replicating the results on ActivityNet-QA:
Bonus: ActivityNet-QA evaluation code provides evaluation of each question sub-type :) Accuracy (per question type): Motion: 32.2500% Spatial Relation: 22.6250% Temporal Relation: 4.1250% Free: 75.7523% All: 43.9750% Accuracy of the Free type questions(per answer type): Yes/No: 75.1194% Color: 51.3630% Object: 27.6730% Location: 39.8964% Number: 54.4554% Other: 36.2241%
P.S.: The difference of -0.04% is consistent for all my experiments on ActivityNet-QA.
Thanks in advance!