lesywix / douban_group_spy

豆瓣小组爬虫
98 stars 23 forks source link

爬多了的时候会出现403错误 #13

Closed moji111 closed 2 years ago

moji111 commented 2 years ago

大约爬了200多个帖子的时候出现的,然后再爬的时候,爬取到的页面就是从您的ip发出了异常请求,是程序没有设置代理池的问题嘛

lesywix commented 2 years ago

没有代理池,有没有具体的报错信息啊?

moji111 commented 2 years ago

2021-10-09 15:02:03 INFO start crawling group: 719888 2021-10-09 15:02:11 INFO getting: https://sec.douban.com/b?r=https%3A%2F%2Fwww.douban.com%2Fgroup%2F719888%2Fdiscussion%3Fstart%3D0, status: 403 2021-10-09 15:02:11 INFO Rate limit, switching host 2021-10-09 15:02:12 INFO getting group: https://sec.douban.com/b?r=https%3A%2F%2Fwww.douban.com%2Fgroup%2F719888%2Fdiscussion%3Fstart%3D0, status: 403 2021-10-09 15:02:12 WARNING Fail to getting: https://sec.douban.com/b?r=https%3A%2F%2Fwww.douban.com%2Fgroup%2F719888%2Fdiscussion%3Fstart%3D0, status: 403

执行命令python crawler_main.py -g 719888 --pages 1 --sleep 30 时出现的这个错误

moji111 commented 2 years ago

image 点进去会出现这个界面~

lesywix commented 2 years ago

应该是 IP 被豆瓣标记了