Closed djrobust closed 9 years ago
This indeed is not a valid JSON. Shortened it looks like this:
{"body": ... }
{"body": ... }
You can't have more than one value at the root of a JSON document, and here you've got two objects simply following one another. So the very beginning of the second one is that "additional data" that ijson complains about.
You probably want to wrap those in an array:
[
{"body": ... },
{"body": ... }
]
Thank you for the explanation and the fix, which works well!
Hello, any idea how to do this sequentially with a large (100gb) JSON file in Python code? Many thanks
@isagalaev @djrobust thanks for the explanation, but how to fix it by coding? I dont know how to write that code to insert " ," after each line
Given the discussion in #42 -- would you accept a PR solving this? I don't see how shall anyone add commas after each line of a large file. That basically ruins all the gains of this package. I could as well do for line in file
then.
@TrinhDinhPhuc
Try jsonfile.write(',')
on your iteration like :
for row in reader: json.dump(row, jsonfile) jsonfile.write(',') jsonfile.write('\n')
I am trying to iteratively parse a large JSON file. However, after the first few events ijson raises a
JSONError: Additional data
. I have looked into ijson's sourcecode, but fail to understand what the problem is.Here is a minimal working example, my eventual goal to extract all objects with '.com' in the body.url.
returns