kunalchandan / Kiss_Consume

A webcrawler for Kiss - Comics, generates a set of .cbz files for a given comic book series
0 stars 0 forks source link

Forbidden access during KissComics Crawl urllib.error.HTTPError: HTTP Error 403: Forbidden #2

Closed kunalchandan closed 5 years ago

kunalchandan commented 5 years ago

Recreate with

Forbidden access error arises as a result of some wrong URL being used. Recreate with: python kiss.py --consume manga -t 'Tenkuu-Shinpan' -s 25

Possible Solution

Consider catching the error and leaving it up to the user to resolve later at some point.

Possible Cause

I haven't looked into it or anything, but it could be that I'm not being selective enough in my search for image URLs, otherwise it could also be because the images are being retrieved from another domain that isn't the usual https://2.bp.blogspot.com/*****?title=*****.jpg and is instead of the form https://s5.mkklcdnv5.com/****/*/****/***/**.jpg

kunalchandan commented 5 years ago

Possibly relevant solution

https://stackoverflow.com/questions/34692009/download-image-from-url-using-python-urllib-but-receiving-http-error-403-forbid

This solution seemed relevant, as in they look like they are doing approximately the same thing.

In Summary

The website is blocking my requests since they do not have a header, however urlretrieve does not support headers, so my download function: https://github.com/kunalchandan/Kiss_Consume/blob/92faf68e575c53403b715f2988fe9feb546f649b/kiss.py#L96-L98 Needs to be re-done using the better solution.

kunalchandan commented 5 years ago

Issue resolved as of https://github.com/kunalchandan/Kiss_Consume/commit/e52ca387890cf5a8193d5b548fb0e0ec9cc999d0