srx-2000 / spider_collection

python爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩哔哩视频封面提取器,ip代理池封装,知乎百万级用户爬虫+数据分析,github用户爬虫
MIT License
1.2k stars 221 forks source link

运行时提示“ModuleNotFoundError: No module named 'scrapy' #5

Closed ericvlog closed 3 years ago

ericvlog commented 3 years ago

根据方法改了search_parser.py , 然后 python search_parser.py 出现 :

Traceback (most recent call last): File "search_parser.py", line 1, in import scrapy ModuleNotFoundError: No module named 'scrapy'

还有我在安装 requirements.txt 时出现这个如何解决?

ERROR: Cannot install -r requirements.txt (line 3) and urllib3==1.26.5 because these package versions have conflicting dependencies.

The conflict is caused by: The user requested urllib3==1.26.5 requests 2.23.0 depends on urllib3!=1.25.0, !=1.25.1, <1.26 and >=1.21.1

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

srx-2000 commented 3 years ago

第一个问题主要原因是没有安装scrapy框架的原因,这里建议可以试试看在cmd中运行以下命令来尝试安装:pip install scrapy -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

第二个问题应该主要是因为urllib3这个库我在中途用robot给升级了以下导致版本冲突了,我会在两天之内再更新一个版本来解决这个问题,同时也解决一下原来的解析代码失效的问题,更新后我会通过github回复通知你的

srx-2000 commented 3 years ago

现已更新新版本,解决了之前的库冲突问题,如果有什么问题可以继续在这里反馈呦

ericvlog commented 3 years ago

大佬,已经安装完了, python search_parser.py 出现如下。看到了生成 cookies.txt 文件,但没看到 video 文件夹 和 video_url.txt 文件。

C:\Users\admin\Desktop\git_spider-master\hubSpider\hubSpider\spiders>python search_parser.py 2021-08-15 21:34:56 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: hubSpider) 2021-08-15 21:34:56 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform Windows-10-10.0.19041-SP0 2021-08-15 21:34:56 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor 2021-08-15 21:34:56 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'hubSpider', 'NEWSPIDER_MODULE': 'hubSpider.spiders', 'SPIDER_MODULES': ['hubSpider.spiders']} 2021-08-15 21:34:56 [scrapy.extensions.telnet] INFO: Telnet Password: 05c093c1159b3058 2021-08-15 21:34:56 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2021-08-15 21:34:58 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2021-08-15 21:34:58 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2021-08-15 21:34:58 [scrapy.middleware] INFO: Enabled item pipelines: [] 2021-08-15 21:34:58 [scrapy.core.engine] INFO: Spider opened 2021-08-15 21:34:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-08-15 21:34:58 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2021-08-15 21:35:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.xvideos.com/?k=chinese> (referer: None) 2021-08-15 21:35:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.xvideos.com/?k=chinese&p=148> (referer: https://www.xvideos.com/?k=chinese) 2021-08-15 21:35:00 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.xvideos.com/?k=chinese&p=148> (referer: https://www.xvideos.com/?k=chinese) Traceback (most recent call last): File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback yield next(it) File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\utils\python.py", line 353, in next return next(self.data) File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\utils\python.py", line 353, in next return next(self.data) File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable for r in iterable: File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output for x in result: File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable for r in iterable: File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 342, in return (_set_referer(r) for r in result or ()) File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable for r in iterable: File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 40, in return (r for r in result or () if _filter(r)) File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable for r in iterable: File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in return (r for r in result or () if _filter(r)) File "C:\Program Files (x86)\Python38-32\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable for r in iterable: File "C:\Users\admin\Desktop\git_spider-master\hubSpider\hubSpider\spiders\search_parser.py", line 28, in parse with open('../video/video_url.txt', mode='w', encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '../video/video_url.txt' 2021-08-15 21:35:00 [scrapy.core.engine] INFO: Closing spider (finished) 2021-08-15 21:35:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 923, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 33468, 'downloader/response_count': 2, 'downloader/response_status_count/200': 2, 'elapsed_time_seconds': 1.976455, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 8, 15, 13, 35, 0, 648982), 'httpcompression/response_bytes': 96291, 'httpcompression/response_count': 2, 'log_count/DEBUG': 2, 'log_count/ERROR': 1, 'log_count/INFO': 10, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'spider_exceptions/FileNotFoundError': 1, 'start_time': datetime.datetime(2021, 8, 15, 13, 34, 58, 672527)} 2021-08-15 21:35:00 [scrapy.core.engine] INFO: Spider closed (finished)

srx-2000 commented 3 years ago

这个问题应该是video目录下没有video_url.txt文件造成的,我又更新了一版,应该可以解决这个问题了,你可以下载新版本试试看,如果还是这个问题的话,就可以在video目录下手动创建一个video_url.txt文件即可。

ericvlog commented 3 years ago

okay,好的等下我试试看,因为之前都没有生成 video 文件夹。

ericvlog commented 3 years ago

下载新版的下来,改了搜索字为 latina, python search_parser.py . 在 video 文件夹内 video_url.txt 如下

https://www.xvideos.com/?k=latina https://www.xvideos.com/?k=latina&p=1 https://www.xvideos.com/?k=latina&p=2 https://www.xvideos.com/?k=latina&p=3 https://www.xvideos.com/?k=latina&p=4 https://www.xvideos.com/?k=latina&p=5 https://www.xvideos.com/?k=latina&p=6

但 video_id.txt 还是你之前搜索的 japanese 结果,导出不了到 video_id.txt ? 还有说我的网路不用代理的,可以直接访问谷歌或国外站。

C:\Users\admin\Desktop\git_spider-master\hubSpider\hubSpider\spiders>python search_parser.py 2021-08-16 16:37:57 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: hubSpider) 2021-08-16 16:37:57 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform Windows-10-10.0.19041-SP0 2021-08-16 16:37:57 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor 2021-08-16 16:37:57 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'hubSpider', 'NEWSPIDER_MODULE': 'hubSpider.spiders', 'SPIDER_MODULES': ['hubSpider.spiders']} 2021-08-16 16:37:57 [scrapy.extensions.telnet] INFO: Telnet Password: c5ab94951d536229 2021-08-16 16:37:57 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2021-08-16 16:37:58 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2021-08-16 16:37:58 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2021-08-16 16:37:58 [scrapy.middleware] INFO: Enabled item pipelines: [] 2021-08-16 16:37:58 [scrapy.core.engine] INFO: Spider opened 2021-08-16 16:37:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-08-16 16:37:58 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2021-08-16 16:38:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.xvideos.com/?k=latina> (referer: None) 2021-08-16 16:38:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.xvideos.com/?k=latina&p=148> (referer: https://www.xvideos.com/?k=latina) 2021-08-16 16:38:00 [scrapy.core.engine] INFO: Closing spider (finished) 2021-08-16 16:38:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 920, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 33397, 'downloader/response_count': 2, 'downloader/response_status_count/200': 2, 'elapsed_time_seconds': 2.065536, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 8, 16, 8, 38, 0, 981410), 'httpcompression/response_bytes': 96384, 'httpcompression/response_count': 2, 'log_count/DEBUG': 2, 'log_count/INFO': 10, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2021, 8, 16, 8, 37, 58, 915874)} 2021-08-16 16:38:00 [scrapy.core.engine] INFO: Spider closed (finished)

srx-2000 commented 3 years ago

嗯,看你发的这个日志好像没什么问题,那个video_id.txt我使用的是追加模式,所以可能需要手动清空一下id那个文件中的内容,或者直接讲id文件删掉也可以

ericvlog commented 3 years ago

删除了,这个文件。。。。在 search 这个文件也不会跑出来。

video_url.txt 这个正常,有搜索的结果在里面, 但video_id.txt 没反应。

https://www.xvideos.com/?k=latina https://www.xvideos.com/?k=latina&p=1 https://www.xvideos.com/?k=latina&p=2

srx-2000 commented 3 years ago

删除了,这个文件。。。。在 search 这个文件也不会跑出来。

video_url.txt 这个正常,有搜索的结果在里面, 但video_id.txt 没反应。

https://www.xvideos.com/?k=latina https://www.xvideos.com/?k=latina&p=1 https://www.xvideos.com/?k=latina&p=2

video_id文件是跑downloader.py文件才会出来的,有了video_url.txt直接跑downloader.py就好啦,解析和下载一步到位