Closed Stareru closed 2 years ago
I submitted "dev.json" to codalab and this resulted in SF1=1.00. Might there be some differences between the local script "evaluate.py" and the script online for evaluation in codalab?
Hi,
You're correct that there was a discrepancy between the evaluation script in the repo and the one on codalab. I've corrected this now. Let me know if you still see the difference.
Hi, we just use the same submitted file for testing, online score is about 9% lower than local.
Sorry,
I have just submitted my results again, but it seems that the gap still exists.
I submitted on Friday and I noticed a 4% decline in performance online. I'm voting to reopen this issue ;)
Thanks! 🙋🏻♂️
Thanks to @akutuzov , finally figured out what was going on. Codalab runs python 2.7.13 to run the evaluation scripts and uses floor division by default :/ So there were incorrect values being returned without error messages in several functions. Added from __future__ import division
and should be solved now. Thanks again for pointing this problem out and sorry for the inconvenience.
I'll close the issue for now, but if you see any further problems, feel free to reopen.
Hello, I have evaluated my predictions locally with "./evaluation/evaluates.py" and "dev.json" (the latest version) as the gold file before submitted them to Codalab. However, results from online evaluation are much lower than my local results (espeically for norec, a 9.4 drop in SF1). Am I using the wrong gold file? Thank you!