您好，请教新浪微博爬虫的问题

myron520 commented 4 years ago

大佬您好，请问我运行您新浪微博爬虫的Create_all.py文件，显示ModuleNotFoundError: No module named 'sqlalchemy'，该怎么解决呢？谢谢！

Eternity666666 commented 4 years ago

你好，我下载了代码运行，但是当有多个用户id时，爬取完一个人的微博后，就会出现错误 Traceback (most recent call last): File "D:/code/Spider-master/weibo/sina_spider.py", line 223, in main(use_proxies=False)#默认不使用代理ip File "D:/code/Spider-master/weibo/sina_spider.py", line 216, in main getmain(resmain, uid, wb_data, conn, mainurl, user_agents, cookies,conf,use_proxies) File "D:/code/Spider-master/weibo/sina_spider.py", line 148, in getmain pagenums=pages[0] IndexError: list index out of range 为什么会出现这个错误呢网上查了下，两种原因，一是下标越界了，二是列表是空的导致的，但是我可以抓取到第一个人的，它为什么会是空的呢。是什么原因导致从网页爬取信息失败吗？爬取时间间隔也不短，我也将headers的connection属性设为了close，防止它由于连接数过多而失败，想请教一下您怎么解决这个问题，谢谢！

starFalll commented 4 years ago

@hoho-yin 应该是没有安装依赖导致的,看一下是否运行了pip3 install -r requirements.txt

starFalll commented 4 years ago

你好，我下载了代码运行，但是当有多个用户id时，爬取完一个人的微博后，就会出现错误 Traceback (most recent call last): File "D:/code/Spider-master/weibo/sina_spider.py", line 223, in main(use_proxies=False)#默认不使用代理ip File "D:/code/Spider-master/weibo/sina_spider.py", line 216, in main getmain(resmain, uid, wb_data, conn, mainurl, user_agents, cookies,conf,use_proxies) File "D:/code/Spider-master/weibo/sina_spider.py", line 148, in getmain pagenums=pages[0] IndexError: list index out of range 为什么会出现这个错误呢网上查了下，两种原因，一是下标越界了，二是列表是空的导致的，但是我可以抓取到第一个人的，它为什么会是空的呢。是什么原因导致从网页爬取信息失败吗？爬取时间间隔也不短，我也将headers的connection属性设为了close，防止它由于连接数过多而失败，想请教一下您怎么解决这个问题，谢谢！

@Eternity666666 可能是第二个用户没有page_number这个属性的信息,导致的数组越界,现在已经修复,在这里如果没有page_number就会报错并跳过.

starFalll / Spider

您好，请教新浪微博爬虫的问题 #4