Closed SeekPoint closed 4 years ago
How can fix it?I have the same problem, too!
This kind of JSON decoding error might occur if the scraper process was terminated in the middle of writing to the file, causing the final line of the .jsonl.gz file to be invalid JSON. The easiest way to fix this might be to uncompress the file and delete the final line before continuing to scrape.
Since multiple people have run into this, I'll look into creating a tool to fix files that might include corrupted JSON. Thanks for reporting this!
it worked for me, @grusky thank you
ub16hp@UB16HP:~/ub16_prj/newsroom$ newsroom-scrape --thin thin/dev.jsonl.gz --archive dev.archive
gzip: stdin: unexpected end of file Loading previously downloaded summaries: Traceback (most recent call last): File "/usr/local/bin/newsroom-scrape", line 11, in
load_entry_point('newsroom', 'console_scripts', 'newsroom-scrape')()
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 722, in call
return self.main(args, kwargs)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 535, in invoke
return callback(args, **kwargs)
File "/home/ub16hp/ub16_prj/newsroom/newsroom/build/scrape.py", line 124, in main
done = {ln["archive"] for ln in f}
File "/home/ub16hp/ub16_prj/newsroom/newsroom/build/scrape.py", line 124, in
done = {ln["archive"] for ln in f}
File "/home/ub16hp/ub16_prj/newsroom/newsroom/build/jsonl.py", line 264, in readlines
yield _json.loads(line)
ValueError: Unmatched ''"' when when decoding 'string'
ub16hp@UB16HP:~/ub16_prj/newsroom$