lil-lab / newsroom

Tools for downloading and analyzing summaries and evaluating summarization systems. https://summari.es/
Other
147 stars 24 forks source link

ValueError: Unmatched ''"' when when decoding 'string' #12

Closed SeekPoint closed 4 years ago

SeekPoint commented 5 years ago

ub16hp@UB16HP:~/ub16_prj/newsroom$ newsroom-scrape --thin thin/dev.jsonl.gz --archive dev.archive

gzip: stdin: unexpected end of file Loading previously downloaded summaries: Traceback (most recent call last): File "/usr/local/bin/newsroom-scrape", line 11, in load_entry_point('newsroom', 'console_scripts', 'newsroom-scrape')() File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 535, in invoke return callback(args, **kwargs) File "/home/ub16hp/ub16_prj/newsroom/newsroom/build/scrape.py", line 124, in main done = {ln["archive"] for ln in f} File "/home/ub16hp/ub16_prj/newsroom/newsroom/build/scrape.py", line 124, in done = {ln["archive"] for ln in f} File "/home/ub16hp/ub16_prj/newsroom/newsroom/build/jsonl.py", line 264, in readlines yield _json.loads(line) ValueError: Unmatched ''"' when when decoding 'string' ub16hp@UB16HP:~/ub16_prj/newsroom$

QiuJun1994 commented 5 years ago

How can fix it?I have the same problem, too!

grusky commented 5 years ago

This kind of JSON decoding error might occur if the scraper process was terminated in the middle of writing to the file, causing the final line of the .jsonl.gz file to be invalid JSON. The easiest way to fix this might be to uncompress the file and delete the final line before continuing to scrape.

Since multiple people have run into this, I'll look into creating a tool to fix files that might include corrupted JSON. Thanks for reporting this!

rodiana commented 4 years ago

it worked for me, @grusky thank you