HI, I've found that evaluating on the charades-sta dataset with the provided best charades checkpoint produces lower results than those reported in the paper. R@0.5 and R@0.7 are about 6% lower each. Do you have any idea why this might be happening?
I've compared the individual predictions to the predictions provided in the google drive link and not all of the predictions are consistent with each other, which is presumably where the performance drop is coming from. Thanks.
HI, I've found that evaluating on the charades-sta dataset with the provided best charades checkpoint produces lower results than those reported in the paper. R@0.5 and R@0.7 are about 6% lower each. Do you have any idea why this might be happening? I've compared the individual predictions to the predictions provided in the google drive link and not all of the predictions are consistent with each other, which is presumably where the performance drop is coming from. Thanks.