Closed moji111 closed 2 years ago
没有代理池,有没有具体的报错信息啊?
2021-10-09 15:02:03 INFO start crawling group: 719888 2021-10-09 15:02:11 INFO getting: https://sec.douban.com/b?r=https%3A%2F%2Fwww.douban.com%2Fgroup%2F719888%2Fdiscussion%3Fstart%3D0, status: 403 2021-10-09 15:02:11 INFO Rate limit, switching host 2021-10-09 15:02:12 INFO getting group: https://sec.douban.com/b?r=https%3A%2F%2Fwww.douban.com%2Fgroup%2F719888%2Fdiscussion%3Fstart%3D0, status: 403 2021-10-09 15:02:12 WARNING Fail to getting: https://sec.douban.com/b?r=https%3A%2F%2Fwww.douban.com%2Fgroup%2F719888%2Fdiscussion%3Fstart%3D0, status: 403
执行命令python crawler_main.py -g 719888 --pages 1 --sleep 30 时出现的这个错误
点进去会出现这个界面~
应该是 IP 被豆瓣标记了
大约爬了200多个帖子的时候出现的,然后再爬的时候,爬取到的页面就是从您的ip发出了异常请求,是程序没有设置代理池的问题嘛