Closed lovevantt closed 1 year ago
代码使用的是提供的代码。 douban_top_250_books.py
问题相同,有解决出来吗
抱歉,我还没来得及做
发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用
发件人: Ryyy233 notifications@github.com 发送时间: Wednesday, March 11, 2020 5:30:58 PM 收件人: wistbean/learn_python3_spider learn_python3_spider@noreply.github.com 抄送: Subscribed subscribed@noreply.github.com 主题: Re: [wistbean/learn_python3_spider] 第二个爬虫程序报错 (#7)
问题相同,有解决出来吗
― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/wistbean/learn_python3_spider/issues/7?email_source=notifications&email_token=AMBMSJI667NYDQ65YCTKNYTRG5K5FA5CNFSM4KGME7W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOOZM2A#issuecomment-597530216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMBMSJNFB2UWD3FK5O7YYPLRG5K5FANCNFSM4KGME7WQ.
原因: 请求豆瓣获取数据失败了,返回了None(看各自的设置,作者的代码是'None',就采用!=‘None’) 解决办法:
html = request_douban(url)
if html is not None:
soup = BeautifulSoup(html, 'lxml')
save_to_excel(soup)
else:
print('request_douban return None')
原因: 请求豆瓣获取数据失败了,返回了None(看各自的设置,作者的代码是'None',就采用!=‘None’) 解决办法:
html = request_douban(url) if html is not None: soup = BeautifulSoup(html, 'lxml') save_to_excel(soup) else: print('request_douban return None')
但是问题还是没有得到实际解决,因为请求失败了。 分析原因可能是豆瓣将我们的操作判断为爬虫了,就拦截了,这时候就可以加入headers,来模拟我们是浏览器请求,而不是爬虫。 重写request_douban方法:
def request_douban(url):
headers = {
# 假装自己是浏览器
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36',
}
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
except requests.RequestException:
return None
加个请求头就行了
加一个这个,不然你是python request,直接被拦截了 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 OPR/66.0.3515.115'}
这是来自QQ邮箱的假期自动回复邮件。你好,我,无法亲自回复你的邮件。我将在假期结束后,尽快给你回复。最近正在休假中
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
错误为: Traceback (most recent call last): File "D:/coding/Python/PyCharm/test1/test2.py", line 127, in
main(i)
File "D:/coding/Python/PyCharm/test1/test2.py", line 119, in main
soup = BeautifulSoup(html, 'lxml')
File "C:\Programs\Python\Python38-32\lib\site-packages\bs4__init.py", line 287, in init__
elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()