Also, without this change, I got these errors after crawling few submissions:
Traceback (most recent call last):
File "/Users/trg/Library/Python/3.7/bin/harwest", line 10, in <module>
sys.exit(main())
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/harwest.py", line 106, in main
args.func(args)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/harwest.py", line 77, in codeforces
CodeforcesWorkflow(configs).run(start_page_index=args.start_page)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/workflow.py", line 83, in run
response.append(self.__add_submission(submission))
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/workflow.py", line 26, in __add_submission
contest_id=submission['contest_id'], submission_id=submission_id)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/client.py", line 30, in get_submission_code
sub_soup = self.__get_content_soup(sub_url)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/client.py", line 17, in __get_content_soup
return BeautifulSoup(self.__get_url_content(url), 'lxml')
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/client.py", line 14, in __get_url_content
return requests.get(url, verify=False).content
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='codeforces.com', port=443):
Max retries exceeded with url: /contest/932/submission/72052331
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x1116c9860>:
Failed to establish a new connection: [Errno 60] Operation timed out'))
Looks like it failed when trying to establish new https connection.
The changes look great @ngthanhtrung23! Wasn't aware of the Sessions feature from requests, thanks for bringing it up! Merging it making a release at once.
This would improve performance.
Also, without this change, I got these errors after crawling few submissions:
Looks like it failed when trying to establish new https connection.