Open schnapi opened 7 years ago
2017-10-27 13:36:31 [scrapy.extensions.logstats] INFO: Crawled 382 pages (at 86 pages/min), scraped 0 items (at 0 items/min)
2017-10-27 13:36:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=33> (referer: None)
2017-10-27 13:36:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=34> (referer: None)
2017-10-27 13:36:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=35> (referer: None)
2017-10-27 13:36:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=36> (referer: None)
2017-10-27 13:36:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=37> (referer: None)
2017-10-27 13:36:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=38> (referer: None)
2017-10-27 13:36:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=40> (referer: None)
2017-10-27 13:36:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=39> (referer: None)
2017-10-27 13:36:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=41> (referer: None)
2017-10-27 13:36:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=42> (referer: None)
2017-10-27 13:36:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=43> (referer: None)
2017-10-27 13:36:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=45> (referer: None)
2017-10-27 13:36:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=44> (referer: None)
2017-10-27 13:36:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=48> (referer: None)
2017-10-27 13:36:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=46> (referer: None)
2017-10-27 13:36:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipe/13423/my-chili/> (referer: http://allrecipes.com/recipes/?page=34)
2017-10-27 13:36:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://allrecipes.com/recipes/?page=47> (referer: None)
2017-10-27 13:36:58 [scrapy.core.scraper] ERROR: Spider error processing <GET http://allrecipes.com/recipe/13423/my-chili/> (referer: http://allrecipes.com/recipes/?page=34)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/mnt/c/Users/Sandi/Desktop/food2vec-master/food2vec-master/dat/RecipesScraper/RecipesScraper/spiders/allrecipes_spider.py", line 31, in parse_item
if len(data['items']) == 0:
TypeError: list indices must be integers, not str
Do you still have all recipes file? Also allrecipes website blocked my ip. Do you have any suggestion how to handle this problem? Thank you!
Thanks @schnapi -- cc'ing @brandonmburroughs here too in case he's interested (he wrote a great scraper for it).
Let me know if the allrecipes file here works for you:
https://github.com/altosaar/food2vec/tree/master/dat
There are also preprocessing scripts here: https://github.com/altosaar/food2vec/blob/master/src/process_scraped_data.py
Facing a similar issue here. I wrote a scraper for allrecipes and initially I got data from the website but they have probably blacklisted my IP. Does anyone know a good work-around?
I would like to know why I am getting a lot of errors like this when I want to scrape allrecipes.com?
Thanks!