john-hu / untitled

0 stars 0 forks source link

peeler for recipetineats #66

Open john-hu opened 2 years ago

john-hu commented 2 years ago

https://www.recipetineats.com/sitemap_index.xml

john-hu commented 2 years ago

deployed with general

john-hu commented 2 years ago

The data is blocked by #96 and #95 . Stop it first and wait for them

john-hu commented 2 years ago

still no data found

john-hu commented 2 years ago

No lock/unlock/mark as error found while processing the requests:

2021-12-29 21:55:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-12-29 21:55:02 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2021-12-29 21:55:02 [peeler.scrapy_utils.spiders.base] INFO: locked urls count: 30
2021-12-29 21:55:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.recipetineats.com/robots.txt> (referer: None)
2021-12-29 21:55:43 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.recipetineats.com/category/collections/dinner-tonight/> from <GET https://www.recipetineats.com/category/easy-dinner-recipes/>
2021-12-29 21:56:02 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2021-12-29 21:56:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.recipetineats.com/category/beef-mince-recipes/> from <GET https://www.recipetineats.com/category/ground-beef-recipes/>
2021-12-29 21:57:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.recipetineats.com/category/collections/dinner-tonight/> (referer: None)
2021-12-29 21:57:02 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2021-12-29 21:57:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.recipetineats.com/category/beef-mince-recipes/> (referer: None)
2021-12-29 21:58:02 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2021-12-29 21:58:20 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.recipetineats.com/hamburger-recipe/> from <GET https://www.recipetineats.com/loaded-beef-hamburgers/>
2021-12-29 21:58:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.recipetineats.com/pancake-recipe/> from <GET https://www.recipetineats.com/simple-fluffy-pancakes/>
2021-12-29 21:59:02 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-12-29 21:59:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.recipetineats.com/hamburger-recipe/> (referer: None)
2021-12-29 21:59:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.recipetineats.com/pancake-recipe/> (referer: None)
2021-12-29 22:00:02 [scrapy.extensions.logstats] INFO: Crawled 5 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
2021-12-29 22:00:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.recipetineats.com/roast-chicken/> from <GET https://www.recipetineats.com/classic-roast-chicken/>

relating to #97 #105

john-hu commented 2 years ago

At least this page should be parsable: https://www.recipetineats.com/hamburger-recipe/