lil-lab / nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
http://lic.nlp.cornell.edu/nlvr/
255 stars 59 forks source link

Broken json files #1

Closed ereday closed 7 years ago

ereday commented 7 years ago

When I try to parse any of json files in python or julia, I got the following error:

>>> import json
>>> with open('train.json') as data_file:
...     data= json.load(data_file)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "erenay/anaconda/lib/python2.7/json/__init__.py", line 291, in load
    **kw)
  File "erenay/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "erenay/anaconda/lib/python2.7/json/decoder.py", line 367, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 991 column 1 (char 700 - 901156)
alsuhr-c commented 7 years ago

This is because the file contains one example per line represented by a JSON object. This makes it easier to open the file in a text editor (so it won't have to load a single, very long line of data). You can load the data using the following in Python:

data = [json.loads(line) for line in open('train.json').readlines()]

This will give you a list of Python dictionaries, each one representing an example.