Different scores reported on the graph

Hi, I tried to reproduce results from several papers. I took their raw results and wrote my code so that this repo can evaluate performance from the raw results. I visualized the results in terms of the bounding boxes positions in each frames. Everything seems to work fine. However, when I plot the graphs, it seems that the reported scores are different compared to those reported in the papers by ~1-2%.

Does anyone has the same observation?

jwlim / tracker_benchmark

Different scores reported on the graph #26