Closed nmweizi closed 6 years ago
Hi @nmweizi it looks like this request was generated by Scrapy (not Frontera). Could you post here your spider code and crawling strategy?
Please note, seeds addition is moved outside of Scrapy and delegated to Frontera, since 0.8.
@sibiryakov is fine.thx.
2018-07-28 16:32:46 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: nmgkInfoCrawl) 2018-07-28 16:32:46 [scrapy.utils.log] INFO: Versions: lxml 4.2.3.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jan 4 2018, 16:40:53) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.3, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-centos-7.4.1708-Core 2018-07-28 16:32:46 [scrapy.crawler] INFO: Overridden settings: {'AJAXCRAWL_ENABLED': True, 'BOT_NAME': 'nmgkInfoCrawl', 'CONCURRENT_REQUESTS': 64, 'COOKIES_ENABLED': False, 'HTTPCACHE_IGNORE_HTTP_CODES': [403], 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'nmgkInfoCrawl.spiders', 'RETRY_TIMES': 5, 'SCHEDULER': 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler', 'SPIDER_MODULES': ['nmgkInfoCrawl.spiders']} 2018-07-28 16:32:46 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2018-07-28 16:32:46 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats', 'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware'] 2018-07-28 16:32:46 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware', 'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware'] 2018-07-28 16:32:46 [scrapy.middleware] INFO: Enabled item pipelines: ['nmgkInfoCrawl.save_sqlite.scrapyPipeline_sqlite'] 2018-07-28 16:32:46 [scrapy.core.engine] INFO: Spider opened 2018-07-28 16:32:46 [manager] INFO: -------------------------------------------------------------------------------- 2018-07-28 16:32:46 [manager] INFO: Starting Frontier Manager... 2018-07-28 16:32:46 [manager] INFO: Frontier Manager Started! 2018-07-28 16:32:46 [manager] INFO: -------------------------------------------------------------------------------- 2018-07-28 16:32:46 [frontera.contrib.scrapy.schedulers.FronteraScheduler] INFO: Starting frontier 2018-07-28 16:32:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-07-28 1
2018-07-28 16:12:14 [scrapy.core.scraper] ERROR: Spider error processing <GET http://www.nm.zsks.cn/18gkwb/index_5.html> (referer: None) Traceback (most recent call last): File "/usr/local/lib64/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) File "/usr/local/lib64/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output for x in result: File "/usr/local/lib64/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "/usr/local/lib64/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "/usr/local/lib64/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python3.6/site-packages/frontera/contrib/scrapy/schedulers/frontier.py", line 112, in process_spider_output
frontier_request = response.meta[b'frontier_request']
KeyError: b'frontier_request'