A known limitation since starting to use pygithub for fetching data: when using their API to iterate through pages via e.g. for count, fork in enumerate(repo.get_forks(), 1) then the individual HTTP request is not retried upon transient error and it's also not really easy to do so (to retry the HTTP request corresponding to one specific page out of many pages) cleanly from the calling program.
Example of a boring transient error affecting one of many HTTP requests, taking down the entire action run.
...
231004-23:07:04.767 INFO:MainThread: 8000 forks fetched
231004-23:07:11.723 INFO:MainThread: 8200 forks fetched
231004-23:07:18.807 INFO:MainThread: 8400 forks fetched
...
Traceback (most recent call last):
File "//fetch.py", line 596, in <module>
main()
File "//fetch.py", line 111, in main
fetch_and_write_fork_ts(repo, args.fork_ts_outpath)
File "//fetch.py", line 225, in fetch_and_write_fork_ts
dfforkcsv = get_forks_over_time(repo)
File "//fetch.py", line 434, in get_forks_over_time
for count, fork in enumerate(repo.get_forks(), 1):
File "/usr/local/lib/python3.10/site-packages/github/PaginatedList.py", line 56, in __iter__
newElements = self._grow()
File "/usr/local/lib/python3.10/site-packages/github/PaginatedList.py", line 67, in _grow
newElements = self._fetchNextPage()
File "/usr/local/lib/python3.10/site-packages/github/PaginatedList.py", line 199, in _fetchNextPage
headers, data = self.__requester.requestJsonAndCheck(
File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 354, in requestJsonAndCheck
*self.requestJson(
File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 454, in requestJson
return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)
File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 528, in __requestEncode
status, responseHeaders, output = self.__requestRaw(
File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 555, in __requestRaw
response = cnx.getresponse()
File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 127, in getresponse
r = verb(
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
...
Retrying this naively at the higher level would involve fetching all forks again. Of course, this is Python and we can do all kinds of workarounds. But they would take more time to build and test.
A known limitation since starting to use pygithub for fetching data: when using their API to iterate through pages via e.g.
for count, fork in enumerate(repo.get_forks(), 1)
then the individual HTTP request is not retried upon transient error and it's also not really easy to do so (to retry the HTTP request corresponding to one specific page out of many pages) cleanly from the calling program.Example of a boring transient error affecting one of many HTTP requests, taking down the entire action run.
Retrying this naively at the higher level would involve fetching all forks again. Of course, this is Python and we can do all kinds of workarounds. But they would take more time to build and test.