nileshsah / harwest-tool

A one-shot tool to harvest submissions from different OJs onto one single VCS managed repository http://bit.ly/harwest
MIT License
130 stars 15 forks source link

Reuse request session in CF client. #3

Closed ngthanhtrung23 closed 3 years ago

ngthanhtrung23 commented 3 years ago

This would improve performance.

Also, without this change, I got these errors after crawling few submissions:

Traceback (most recent call last):
  File "/Users/trg/Library/Python/3.7/bin/harwest", line 10, in <module>
    sys.exit(main())
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/harwest.py", line 106, in main
    args.func(args)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/harwest.py", line 77, in codeforces
    CodeforcesWorkflow(configs).run(start_page_index=args.start_page)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/workflow.py", line 83, in run
    response.append(self.__add_submission(submission))
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/workflow.py", line 26, in __add_submission
    contest_id=submission['contest_id'], submission_id=submission_id)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/client.py", line 30, in get_submission_code
    sub_soup = self.__get_content_soup(sub_url)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/client.py", line 17, in __get_content_soup
    return BeautifulSoup(self.__get_url_content(url), 'lxml')
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/harwest/lib/codeforces/client.py", line 14, in __get_url_content
    return requests.get(url, verify=False).content
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/trg/Library/Python/3.7/lib/python/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='codeforces.com', port=443):
Max retries exceeded with url: /contest/932/submission/72052331
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x1116c9860>:
Failed to establish a new connection: [Errno 60] Operation timed out'))

Looks like it failed when trying to establish new https connection.

nileshsah commented 3 years ago

The changes look great @ngthanhtrung23! Wasn't aware of the Sessions feature from requests, thanks for bringing it up! Merging it making a release at once.