webcomics / dosage

dosage is a comic strip downloader and archiver
https://dosage.rocks/
MIT License
123 stars 59 forks source link

Dosage fails with cryptic errors when dosage.json is unreadable #25

Open mbrandis opened 9 years ago

mbrandis commented 9 years ago

Hi!

For some days I am getting strange errors while download comics:

Arcamax/BabyBlues> Retrieving all strips
Arcamax/BabyBlues> Saved Comics/Arcamax/BabyBlues/1244528.gif (101.60KB).
Arcamax/BabyBlues> ERROR: Could not save image at http://www.arcamax.com/thefunnies/babyblues/ to 1244528: ValueError('Expecting object: line 1032 column 6 (char 74222)',)
Arcamax/BabyBlues> Stop retrieval because image file already exists
OnTheFastrack> Retrieving all strips
OnTheFastrack> Saved Comics/OnTheFastrack/July-26-2015.gif (141.31KB).
OnTheFastrack> ERROR: Could not save image at http://onthefastrack.com/ to July-26-2015: ValueError('Expecting object: line 17031 column 8 (char 903967)',)
OnTheFastrack> Stop retrieval because image file already exists

The images are downloaded, then the error occurs and the images are not included in the html or rss output.

Can anyone help?

TIA Mark

Null000 commented 8 years ago

Sadly I can't manage to reproduce this problem. Could you try adding -v to the command you run so we can get more information next time it occurs.

TobiX commented 8 years ago

Sorry, can't reproduce, closing for now. Please reopen if you re-encounter this problem.

mbrandis commented 8 years ago

I have the problem (again). Here is the complete output:

Arcamax/BabyBlues> Saved ~/www/comics/Arcamax/BabyBlues/1433370.gif (70.99KB).
Arcamax/BabyBlues> ERROR: Could not save image at http://www.arcamax.com/thefunnies/babyblues/ to 1433370: ValueError('Expecting object: line 1032 column 6 (char 74222)',)
Arcamax/BabyBlues>   File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
Arcamax/BabyBlues>     self.__bootstrap_inner()
Arcamax/BabyBlues>   File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
Arcamax/BabyBlues>     self.run()
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/director.py", line 82, in run
Arcamax/BabyBlues>     self.getStrips(scraperobj)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/director.py", line 96, in getStrips
Arcamax/BabyBlues>     self._getStrips(scraperobj)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/director.py", line 112, in _getStrips
Arcamax/BabyBlues>     skipped = self.saveComicStrip(strip)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/director.py", line 140, in saveComicStrip
Arcamax/BabyBlues>     out.exception('Could not save image at %s to %s: %r' % (image.referrer, image.filename, msg))
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/output.py", line 93, in exception
Arcamax/BabyBlues>     self.writelines(traceback.format_stack(), 1)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/comic.py", line 94, in save
Arcamax/BabyBlues>     getHandler().comicDownloaded(self, fn, text=self.text)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/events.py", line 328, in comicDownloaded
Arcamax/BabyBlues>     handler.comicDownloaded(comic, filename, text=text)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/events.py", line 275, in comicDownloaded
Arcamax/BabyBlues>     pageInfo = self.getPageInfo(comic.name, comic.referrer)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/events.py", line 268, in getPageInfo
Arcamax/BabyBlues>     comicData = self.getComicData(comic)
Arcamax/BabyBlues>   File "~/Desktop/Workspace/dosage/dosagelib/events.py", line 261, in getComicData
Arcamax/BabyBlues>     self.data[comic] = json.load(f)
Arcamax/BabyBlues>   File "/usr/lib/python2.7/json/__init__.py", line 290, in load
Arcamax/BabyBlues>     **kw)
Arcamax/BabyBlues>   File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
Arcamax/BabyBlues>     return _default_decoder.decode(s)
Arcamax/BabyBlues>   File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
Arcamax/BabyBlues>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Arcamax/BabyBlues>   File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
Arcamax/BabyBlues>     obj, end = self.scan_once(s, idx)
Arcamax/BabyBlues> ValueError: Expecting object: line 1032 column 6 (char 74222)
Arcamax/BabyBlues> Stop retrieval because image file already exists

It only occurs with Baby Blues and On The Fastrack. Other Arcamax like Zits for example work.

TobiX commented 8 years ago

It seems you are using the JSON output module, which somehow trashed the dosage.json in the target directory. Could you attach the broken dosage.json here, so I can check if the corruption was caused by Dosage or is just random? Anyways, deleting the dosage.json should fix the problem.

mbrandis commented 8 years ago

Thanks for the answer! Attached is the json file. I had to zip it since github doesn't support json directly: dosage.zip

TobiX commented 8 years ago

We probably should throw a more useful error message when the JSON file is corrupt...