there are many bugs in the evaluation script

I ran into the same problem while running the sample cases.

Traceback (most recent call last):
  File "ms_marco_eval.py", line 20, in <module>
    from rouge.rouge import Rouge
  File "/Users/justincho/Desktop/Imago/Computer comprehension/MSMARCOV2/Evaluation/rouge/rouge.py", line 86
    imgIds = list(gts.keys())

If you happened to get this error, just go to the rouge.py file and add another ")" to the end of line 86

I then got the following error:

OSError: [E050] Can't find model 'en_core_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

This means you don't have this particular model from spacy downloaded, even if you have installed spacy. Run the following in your terminal: pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz

I actually still have a problem with one set of the sample files when I run

. run.sh sample_test_data/no_answer_test_references.json sample_test_data/no_answer_test_candidates.json 
. run.sh sample_test_data/dev_as_references.json sample_test_data/dev_first_sentence_as_candidates.json

which state that AssertionError: Reference and candidate files must share same query ids I believe this error might be intended for the no_answer set, but I think it shouldn't return this error for the dev_as_references set.

The script will work fine for the remaining sets:

. run.sh sample_test_data/sample_references.json sample_test_data/sample_candidates.json`
. run.sh sample_test_data/same_answer_test_references.json sample_test_data/same_answer_test_candidates.json

I hope this helped.

spacemanidol / MSMARCO

there are many bugs in the evaluation script #3