openmainframeproject / software-discovery-tool

Software Discovery Tool
Apache License 2.0
31 stars 40 forks source link

Build timeout on package_build.py for Fedora #191

Open pleia2 opened 2 months ago

pleia2 commented 2 months ago

While generating the sources for Fedora, I got a build timeout while it was building Fedora 40. Full traceback below (the 404 errors are unrelated and a known issue).

Essentially, it looks like by traversing dozens of directories for three different versions, we're hitting some sort of rate limiting so it hangs until it finally fails without generating Fedora_40_List.json.

We should evaluate the logic we're using in the script, do some research into whether there's a better way of collecting this data (a different source or API?), and perhaps reach out to a contact at Fedora to get their thoughts.

software-discovery-tool/bin $ ./package_build.py fedora
Extracting fedora data ... 
404 Directory 1 not found
404 Directory 5 not found
Saved!
filename: Fedora_38_List.json
404 Directory 1 not found
404 Directory 5 not found
404 Directory 7 not found
Saved!
filename: Fedora_39_List.json
404 Directory 1 not found
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1017, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1100, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1371, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 756, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 700, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 337, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='dl.fedoraproject.org', port=443): Read timed out. (read timeout=None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/elizabeth/git/software-discovery-tool/bin/./package_build.py", line 372, in <module>
    fedora()
  File "/home/elizabeth/git/software-discovery-tool/bin/./package_build.py", line 139, in fedora
    req = requests.get(link)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 544, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 657, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='dl.fedoraproject.org', port=443): Read timed out. (read timeout=None)
Paul-Annay commented 2 months ago

Hey @pleia2 hope you're doing well. I tried to reproduce this issue. I only executed the package_build.py file in a separate directory isolated from the actual code base (I was temporarily working on a windows PC, I got curious seeing this issue so hurriedly tried to reproduce it without setting up the entire repo with all the dependencies). I'll admit that the build took a long time (did not track how long but at least 6-7 minutes or more) but apparently it did eventually generate the Fedora_40_List.json file. Here's what I got:

Extracting fedora data ... 
404 Directory 1 not found
404 Directory 5 not found
Saved!
filename: Fedora_38_List.json
404 Directory 1 not found
404 Directory 5 not found
404 Directory 7 not found
Saved!
filename: Fedora_39_List.json
404 Directory 1 not found
404 Directory 5 not found
404 Directory 7 not found
Saved!
filename: Fedora_40_List.json
Thanks for using SDT!

Below is a snapshot of the directory structure I was using (within a virtual environment in python)

image

I'm looking into ways that would make the build work faster but in the meantime I would be glad if you could tell me if there was anything specific you did in order to get this error (I'll run this again on my actual PC where the repository is set up properly to see if it gets regenerated there). Thanks!

rachejazz commented 2 months ago

I'm guessing the scripts lacks timeout parameters on requests and error handling for the same. Also, we'd need to async it. That will make processing each file faster!

glitcher007 commented 1 month ago

hey @pleia2 Is this issue still open?

Paul-Annay commented 1 month ago

@glitcher007 yes it's open I got busy with other things so couldn't keep track of this though I initially looked to make some changes.... You're more than welcome to work on it, or share some insights. May be we could come up with something together.

pleia2 commented 1 month ago

Hi @glitcher007 Thanks for asking, it is! The big thing about this one is that it doesn't always happen, but when it does it blocks the installation from continuing, so it's important that we work to find a better way to gather this data. It's taking a long time because it's doing a bunch of requests as it traverses these directories, and I suspect that looks like some sort of attack to the Fedora servers, so it stops allowing access. We don't want that either :smile:

glitcher007 commented 1 month ago

Hii @pleia2 As I was going through the code I also found the same mistake, I think caching the data would be an option here. But I can see for any changes that could be made to solve that

hbarsaiyan commented 1 month ago

I was thinking if we can iterate over a mirrorlist to use some other mirror if the connection gets timed out. Something like https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-40&arch=s390x&country=global. I am still figuring out how the mirrorlist can be filtered for other distros. As a last resort, we can manually add some sources in a list.

pleia2 commented 1 month ago

I was thinking if we can iterate over a mirrorlist to use some other mirror if the connection gets timed out. Something like https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-40&arch=s390x&country=global. I am still figuring out how the mirrorlist can be filtered for other distros. As a last resort, we can manually add some sources in a list.

My big question about this approach is whether it's what the Fedora community would prefer. If they find our method of traversing directories to be a problem, just jumping to another mirror could be seen as abusive behavior.

@sharkcz Do you have any thoughts?