Issue with replicating the results?

When replicating the results of the CoNLL 2017 shared task from the models here https://nlp.stanford.edu/data/CoNLL17_Parser_Models/ , I ran into an issue as I could not get the same performance as is reported in the paper. My steps in reproducing, I showcase English LinES treebank:

1) download ud-treebanks-conll2017 that was distributed among participants of CoNLL 2017 shared task 2) download respective model from https://nlp.stanford.edu/data/CoNLL17_Parser_Models/ 3) download test sets ud-test-v2.0-conll2017 4) tag test set file "en_lines.conllu" in ud-test-v2.0-conll2017 by the downloaded model "English-LinES-Tagger" 5) parse tagged in step 4 test set file "en_lines.conllu" located in "English-LinES-Tagger" by downloaded model "English-LinES-Parser" 6) evaluate by the conll17_ud_eval.py script with the gold data and parsed result.

The result is different than the one reported in the paper. So what could have gone wrong?

tdozat / Parser-v2

Issue with replicating the results? #17