jayleicn / singularity

[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
https://arxiv.org/abs/2206.03428
MIT License
130 stars 14 forks source link

Discrepancy of 0.04% in ActivityNet-QA evaluation code #28

Open israwal opened 1 year ago

israwal commented 1 year ago

Hi, I am consistently finding a difference of -0.04% in the reported performance on ActivityNet-QA dataset when using the official code for evaluation (https://github.com/MILVLG/activitynet-qa) Replicating the results on ActivityNet-QA:

  1. Accuracy using Singularity-Temporal (n=12 frames, num_temporal_layers=2, ckpt: ft_anet_qa_singularity_temporal_17m.pth): 44.01%
  2. Accuracy using ActivityNet-QA: 43.97%

Bonus: ActivityNet-QA evaluation code provides evaluation of each question sub-type :) Accuracy (per question type): Motion: 32.2500% Spatial Relation: 22.6250% Temporal Relation: 4.1250% Free: 75.7523% All: 43.9750% Accuracy of the Free type questions(per answer type): Yes/No: 75.1194% Color: 51.3630% Object: 27.6730% Location: 39.8964% Number: 54.4554% Other: 36.2241%

P.S.: The difference of -0.04% is consistent for all my experiments on ActivityNet-QA.

Thanks in advance!