spacemanidol / MSMARCO

Utilities, Baselines, Statistics and Descriptions Related to the MSMARCO DATASET
MIT License
189 stars 41 forks source link

KeyError for converttowellformed.py #28

Closed ghost closed 5 years ago

ghost commented 5 years ago
  1. On running the converttowellformed.py script from the utils, while it works on the dev and train files, it produces a KeyError for the eval file.
python converttowellformed.py eval_v2.1_public.json eval.json

Traceback (most recent call last):
  File "converttowellformed.py", line 14, in <module>
    makewf(sys.argv[1],sys.argv[2])
  File "converttowellformed.py", line 6, in makewf
    df = df.drop('answers',1)
  File "/home/sudeshna/envs/.env/lib/python3.5/site-packages/pandas/core/frame.py", line 3697, in drop
    errors=errors)
  File "/home/sudeshna/envs/.env/lib/python3.5/site-packages/pandas/core/generic.py", line 3111, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/sudeshna/envs/.env/lib/python3.5/site-packages/pandas/core/generic.py", line 3143, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/sudeshna/envs/.env/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 4404, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['answers'] not found in axis"
  1. The json file created after running the converttowellformed.py script on the dev and train files do not contain wellFormedAnswers as a key.
with jsonlines.open('dev.jsonl') as reader:
    for obj in reader:
            print(obj['query'])
        print(obj['wellFormedAnswers'])

albany mn population
Traceback (most recent call last):
  File "jsonl_reader.py", line 8, in <module>
    if obj['wellFormedAnswers']:
KeyError: 'wellFormedAnswers'

(I have converted the json file to jsonl prior to accessing it for the second example.)

spacemanidol commented 5 years ago

the convert to well formed answers wont work on eval because the wellformed answers on eval are held out. This is expected.