Closed imwowzer closed 5 years ago
1、在cmd中输入scrapy crawl mySpider 提示如下 D:\python_crawl\crawl_software>scrapy crawl mySpider Scrapy 1.3.0 - no active project Unknown command: crawl Use "scrapy" to see available commands
原因:没有cd到项目根目录,因为crawl会去搜搜cmd目录下的scrapy.cfg
Traceback (most recent call last):
File "c:\users\sunwei\appdata\local\programs\python\python37-32\lib\runpy.py",
line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\sunwei\appdata\local\programs\python\python37-32\lib\runpy.py",
line 85, in _run_code
exec(code, run_globals)
File "C:\Users\sunwei\AppData\Local\Programs\Python\Python37-32\Scripts\scrapy
.exe__main.py", line 9, in
原因:tab对齐有问题,把yield item前面空格删除到上一行,再在yield前面回车就可以了
Traceback (most recent call last):
File "c:\users\sunwei\appdata\local\programs\python\python37-32\lib\runpy.py",
line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\sunwei\appdata\local\programs\python\python37-32\lib\runpy.py",
line 85, in _run_code
exec(code, run_globals)
File "C:\Users\sunwei\AppData\Local\Programs\Python\Python37-32\Scripts\scrapy
.exe__main.py", line 9, in
原因:抄代码没改,新建的项目名是get_liepin,items.py里定义的类是GetLiepinItem,所以
from Demo.items import DemoItem
要改为
from get_liepin.items import GetLiepinItem
2019-03-14 18:24:59 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: get_liep
in)
2019-03-14 18:24:59 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9
.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.7.2 (t
ags/v3.7.2:9a3ffc0492, Dec 23 2018, 22:20:52) [MSC v.1916 32 bit (Intel)], pyOpe
nSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-
7-6.1.7601-SP1
2019-03-14 18:24:59 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'ge
t_liepin', 'NEWSPIDER_MODULE': 'get_liepin.spiders', 'ROBOTSTXT_OBEY': True, 'SP
IDER_MODULES': ['get_liepin.spiders']}
2019-03-14 18:24:59 [scrapy.extensions.telnet] INFO: Telnet Password: e2a8f620e5
79f88e
2019-03-14 18:24:59 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-03-14 18:25:00 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-03-14 18:25:00 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
实例化GetLiepinPipeline
2019-03-14 18:25:00 [scrapy.middleware] INFO: Enabled item pipelines:
['get_liepin.pipelines.GetLiepinPipeline']
2019-03-14 18:25:00 [scrapy.core.engine] INFO: Spider opened
2019-03-14 18:25:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pag
es/min), scraped 0 items (at 0 items/min)
2019-03-14 18:25:00 [py.warnings] WARNING: c:\users\sunwei\appdata\local\program
s\python\python37-32\lib\site-packages\scrapy\spidermiddlewares\offsite.py:61: U
RLWarning: allowed_domains accepts only domains, not URLs. Ignoring URL entry ht
tp://www.itcast.cn in allowed_domains.
warnings.warn(message, URLWarning)
2019-03-14 18:25:00 [scrapy.extensions.telnet] INFO: Telnet console listening on
127.0.0.1:6023
2019-03-14 18:25:13 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET ht
tp://www.itcast.cn/robots.txt> (failed 1 times): [<twisted.python.failure.Failur
e twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2019-03-14 18:25:14 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET ht
tp://www.itcast.cn/robots.txt> (failed 2 times): [<twisted.python.failure.Failur
e twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2019-03-14 18:25:15 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying
<GET http://www.itcast.cn/robots.txt> (failed 3 times): [<twisted.python.failur
e.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>
]
2019-03-14 18:25:15 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downlo
ading <GET http://www.itcast.cn/robots.txt>: [<twisted.python.failure.Failure tw
isted.internet.error.ConnectionDone: Connection was closed cleanly.>]
Traceback (most recent call last):
File "c:\users\sunwei\appdata\local\programs\python\python37-32\lib\site-packa
ges\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure t
wisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2019-03-14 18:25:28 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET ht
tp://www.itcast.cn/channle/teacher.shtml> (failed 1 times): [<twisted.python.fai
lure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanl
y.>]
2019-03-14 18:25:29 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET ht
tp://www.itcast.cn/channle/teacher.shtml> (failed 2 times): [<twisted.python.fai
lure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanl
y.>]
2019-03-14 18:25:30 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying
<GET http://www.itcast.cn/channle/teacher.shtml> (failed 3 times): [<twisted.py
thon.failure.Failure twisted.internet.error.ConnectionDone: Connection was close
d cleanly.>]
2019-03-14 18:25:30 [scrapy.core.scraper] ERROR: Error downloading <GET http://w
ww.itcast.cn/channle/teacher.shtml>
Traceback (most recent call last):
File "c:\users\sunwei\appdata\local\programs\python\python37-32\lib\site-packa
ges\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure t
wisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2019-03-14 18:25:30 [scrapy.core.engine] INFO: Closing spider (finished)
结束
2019-03-14 18:25:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived':
6,
'downloader/request_bytes': 1365,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2019, 3, 14, 10, 25, 30, 528038),
'log_count/DEBUG': 6,
'log_count/ERROR': 2,
'log_count/INFO': 9,
'log_count/WARNING': 1,
'retry/count': 4,
'retry/max_reached': 2,
'retry/reason_count/twisted.web._newclient.ResponseNeverReceived': 4,
"robotstxt/exception_count/<class 'twisted.web._newclient.ResponseNeverReceived
'>": 1,
'robotstxt/request_count': 1,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2019, 3, 14, 10, 25, 0, 475319)}
2019-03-14 18:25:30 [scrapy.core.engine] INFO: Spider closed (finished)
C:\Users\sunwei\get_liepin>
创建scrapy
测试用代码放在