scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
1.03k stars 113 forks source link

Spider stops crawling, keeps printing same log lines and spider process does not end #193

Closed AhsanMoavia closed 1 year ago

AhsanMoavia commented 1 year ago

After running for some time, the spider starts to output following logs, and does not crawl further URLs. There is no particular exception or error shown in log file before these logs start printing. Also, the behavior is random, these log lines may start printing right after spider opened, or in middle of crawl. There are also cases where the crawl goes smooth and successfully get completed

2023-03-30 11:55:49 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:56:19 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min) 2023-03-30 11:56:19 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:56:49 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:57:19 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2023-03-30 11:57:19 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:57:49 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:58:19 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2023-03-30 11:58:19 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:58:49 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:59:19 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2023-03-30 11:59:19 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 11:59:49 [Dreams.prislo_rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 4, reanimated: 0, mean backoff time: 0s) 2023-03-30 12:00:19 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

Following is the info about environment: OS: debian 10 scrapy: 2.6.2 scrapy-playwright: 0.0.26 playwright: 1.25.2

elacuesta commented 1 year ago

The scrapy.extensions.logstats lines are generated by Scrapy itself and are completely normal. I don't know where the Dreams.prislo_rotating_proxies.middlewares ones come from, I'd suggest you to look into that. It's impossible to do any more debugging with the provided information, any number of things could be happening for the crawl to get stuck.