john-hu / untitled

0 stars 0 forks source link

handle non-200 cases #102

Closed john-hu closed 2 years ago

john-hu commented 2 years ago

We unlock the url when we got error. We should handle them one by one, like:

2021-12-27 23:47:03 [scrapy.extensions.logstats] INFO: Crawled 12 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
2021-12-27 23:47:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.101cookbooks.com/archives/saffron-pasta-salad-recipe.html/055305273X> (referer: None)
2021-12-27 23:47:29 [peeler.scrapy_utils.spiders.base] ERROR: <twisted.python.failure.Failure scrapy.spidermiddlewares.httperror.HttpError: Ignoring non-200 response>
john-hu commented 2 years ago

Please don't forget to increase the fetched_count while handling error.