Discrepancy of 0.04% in ActivityNet-QA evaluation code

Hi, I am consistently finding a difference of -0.04% in the reported performance on ActivityNet-QA dataset when using the official code for evaluation (https://github.com/MILVLG/activitynet-qa) Replicating the results on ActivityNet-QA:

Accuracy using Singularity-Temporal (n=12 frames, num_temporal_layers=2, ckpt: ft_anet_qa_singularity_temporal_17m.pth): 44.01%
Accuracy using ActivityNet-QA: 43.97%

Bonus: ActivityNet-QA evaluation code provides evaluation of each question sub-type :) Accuracy (per question type): Motion: 32.2500% Spatial Relation: 22.6250% Temporal Relation: 4.1250% Free: 75.7523% All: 43.9750% Accuracy of the Free type questions(per answer type): Yes/No: 75.1194% Color: 51.3630% Object: 27.6730% Location: 39.8964% Number: 54.4554% Other: 36.2241%

P.S.: The difference of -0.04% is consistent for all my experiments on ActivityNet-QA.

Thanks in advance!

jayleicn / singularity

Discrepancy of 0.04% in ActivityNet-QA evaluation code #28