wanzixin / SinaWeibo-LocationSignIn-spider

以城市为单位爬取新浪微博移动端poi与poi下的微博信息
25 stars 4 forks source link

超时问题 #2

Open Erenjager417 opened 4 years ago

Erenjager417 commented 4 years ago

您好,请问在爬取代理的时候出现如下错误应该怎么解决呢? ----------------爬取代理使用的ip为: {'http': '223.241.119.42:47972'} -------------------- Traceback (most recent call last): File "D:\Program Files\python36\lib\urllib\request.py", line 1318, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "D:\Program Files\python36\lib\http\client.py", line 1239, in request self._send_request(method, url, body, headers, encode_chunked) File "D:\Program Files\python36\lib\http\client.py", line 1285, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "D:\Program Files\python36\lib\http\client.py", line 1234, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "D:\Program Files\python36\lib\http\client.py", line 1026, in _send_output self.send(msg) File "D:\Program Files\python36\lib\http\client.py", line 964, in send self.connect() File "D:\Program Files\python36\lib\http\client.py", line 1392, in connect super().connect() File "D:\Program Files\python36\lib\http\client.py", line 936, in connect (self.host,self.port), self.timeout, self.source_address) File "D:\Program Files\python36\lib\socket.py", line 724, in create_connection raise err File "D:\Program Files\python36\lib\socket.py", line 713, in create_connection sock.connect(sa) socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 67, in get context=context, File "D:\Program Files\python36\lib\urllib\request.py", line 223, in urlopen return opener.open(url, data, timeout) File "D:\Program Files\python36\lib\urllib\request.py", line 526, in open response = self._open(req, data) File "D:\Program Files\python36\lib\urllib\request.py", line 544, in _open '_open', req) File "D:\Program Files\python36\lib\urllib\request.py", line 504, in _call_chain result = func(*args) File "D:\Program Files\python36\lib\urllib\request.py", line 1361, in https_open context=self._context, check_hostname=self._check_hostname) File "D:\Program Files\python36\lib\urllib\request.py", line 1320, in do_open raise URLError(err) urllib.error.URLError:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:/Pycharm/weiboqiandao/crawler.py", line 255, in main() File "E:/Pycharm/weiboqiandao/crawler.py", line 229, in main ippool = build_ippool() File "E:\Pycharm\weiboqiandao\buildip.py", line 82, in build_ippool results = p.get_proxy(page) File "E:\Pycharm\weiboqiandao\buildip.py", line 37, in get_proxy res = requests.get(url, proxies=proxy_ip, headers={'User-Agent': UserAgent(use_cache_server=False).random}) File "D:\Program Files\python36\lib\site-packages\fake_useragent\fake.py", line 69, in init self.load() File "D:\Program Files\python36\lib\site-packages\fake_useragent\fake.py", line 78, in load verify_ssl=self.verify_ssl, File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 250, in load_cached update(path, use_cache_server=use_cache_server, verify_ssl=verify_ssl) File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 245, in update write(path, load(use_cache_server=use_cache_server, verify_ssl=verify_ssl)) File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 178, in load raise exc File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 154, in load for item in get_browsers(verify_ssl=verify_ssl): File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 97, in get_browsers html = get(settings.BROWSERS_STATS_PAGE, verify_ssl=verify_ssl) File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 84, in get raise FakeUserAgentError('Maximum amount of retries reached') fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

wanzixin commented 4 years ago

socket.timeout: timed out 应该是这个问题,看能不能修改默认时间

Erenjager417 commented 4 years ago

请问默认时间应该怎么修改啊?是修改这一行里的timeout吗? try: if requests.get('https://www.baidu.com', proxies={'http': proxy}, timeout=3).status_code == 200: print('这是第 {} 个代理, '.format(index) + proxy + ' is useful')

wanzixin commented 4 years ago

嗯,把timeout调大一点看能不能解决问题