Closed dengerwa closed 6 years ago
请把你的问题描述清楚,比如你的具体操作,如何启动爬虫的,启动的是什么爬虫。是否安装了必要的mongodb 和redis 等等, 不然我是没办法猜到你遇到什么东东的
安装好了pip install -r requirements.txt 这个里面的插件 只有一个没有安装graphite, mongodb安装在另外一个Ubuntu服务器上面 。然后运行jd.py
E:\pywork\venv1\Scripts\python.exe E:/pywork/jd_spider-master/jd/jd/spiders/jd.py
ITEM_PIPELINES = { 'jd.pipelines.MongoDBPipeline': 300, 'scrapy_redis.pipelines.RedisPipeline': 300 } MONGODB_SERVER = "192.168.1.168" MONGODB_PORT = 27017 MONGODB_DB = "jindong"
setting.py 里面我已经修改了服务器地址和端口了 然后手动创建了一个jindong 数据
你给出来的堆栈信息应该不是完整的堆栈信息,可否给出完整的堆栈呢?
关于你的问题,我猜测可能是两个原因,你没有安装 graphite
,现在jd_spider
这只爬虫的启动依赖graphite
所以你没有装 graphite
, 会导致启动不了爬虫。解决方法是你自己把graphite
的依赖去掉,graphite
的作用只是监控,去掉并不会影响爬虫功能,或者可以等我放假有时间把依赖去掉。
另外一个原因可能是你的mongodb 连接有问题.
好的 我再试试看
需要手动创建列表吗? 感觉数据库里面没有创建 相关关键字
那个数据库是我手动创建的 谢谢 我去看看pymongo文档
/usr/bin/python3.5 /home/dengbo/pywork/jd_spider-master/jd/jd/spiders/jd.py
Traceback (most recent call last):
File "/home/dengbo/pywork/jd_spider-master/jd/jd/spiders/jd.py", line 10, in
还是搞不定 mongo 已经安装了 但是连不上 如果您有空的话 能帮我看看吗?
这个报错和mongo 没关系,你能不能把你运行爬虫的路径截图出来.我觉得你是运行的路径出错了.你试试在 /home/dengbo/pywork/jd_spider-master/jd
执行scrapy crawl jindong
.
您看看是不是这样子运行的
root@dengbo-ThinkPad:/home/dengbo/pywork/jd_spider-master/jd# scrapy crawl jindong
2018-04-06 18:01:28 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: jd)
2018-04-06 18:01:28 [scrapy.utils.log] INFO: Overridden settings: {'DUPEFILTER_CLASS': 'scrapy_redis.dupefilter.RFPDupeFilter', 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler', 'SPIDER_MODULES': ['jd.spiders'], 'BOT_NAME': 'jd', 'NEWSPIDER_MODULE': 'jd.spiders', 'CONCURRENT_REQUESTS_PER_DOMAIN': 16, 'STATS_CLASS': 'jd.statscol.graphite.RedisGraphiteStatsCollector', 'CONCURRENT_REQUESTS': 32}
2018-04-06 18:01:28 [py.warnings] WARNING: /home/dengbo/pywork/jd_spider-master/jd/jd/statscol/graphite.py:7: ScrapyDeprecationWarning: Module scrapy.log
has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more.
from scrapy import log
2018-04-06 18:01:28 [py.warnings] WARNING: /home/dengbo/pywork/jd_spider-master/jd/jd/statscol/graphite.py:8: ScrapyDeprecationWarning: Module scrapy.statscol
is deprecated, use scrapy.statscollectors
instead
from scrapy.statscol import StatsCollector
2018-04-06 18:01:28 [jindong] WARNING: could not connect to graphite Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 439, in connect sock = self._connect() File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 494, in _connect raise err File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 482, in _connect sock.connect(socket_address) ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/redis/client.py", line 572, in execute_command connection.send_command(args) File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 563, in send_command self.send_packed_command(self.pack_command(args)) File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 538, in send_packed_command self.connect() File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 442, in connect raise ConnectionError(self._error_message(e)) redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 439, in connect sock = self._connect() File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 494, in _connect raise err File "/usr/local/lib/python3.5/dist-packages/redis/connection.py", line 482, in _connect sock.connect(socket_address) ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in
不,用命令行,而不是pycharm.
cd /home/dengbo/pywork/jd_spider-master/jd
scrapy crawl jindong
还有一点,拉取最新的代码.原来的代码没有装graphite
是会报错的,你也没有安装redis.
raise ConnectionError(self._error_message(e)) redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused. 我刚才下载了最新版的那个 redis 安装了哒 。
这个问题是因为没安装Redis的服务器造成的。 我才郁闷 哈哈哈 解决办法:
sudo apt-get install redis-server
请问一下 PROXY_LIST = 'path/to/proxy_ip.txt' 这个路径填写 绝对路径 还是按照您教程里面的填写
我文档里面写的就是 PROXY_LIST = 'path/to/proxy_ip.txt'
, 如果你有代理IP, 相对路径和绝对路径都是可以的,但是相对路径依靠你运行的路径,所以还是绝对路径对你来说更合适
我想问一下 代理IP 可以不加http://吗?
E:\python\python.exe E:/pythowork/jd_spider-master/jd_spider-master/jd/jd/spiders/jd.py Traceback (most recent call last): File "E:/pythowork/jd_spider-master/jd_spider-master/jd/jd/spiders/jd.py", line 10, in
from jd.items import ParameterItem
File "E:\pythowork\jd_spider-master\jd_spider-master\jd\jd\spiders\jd.py", line 10, in
from jd.items import ParameterItem
这个是啥东东