nexB / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
29 stars 21 forks source link

purl2vcs: out of memory error #489

Open JonoYang opened 2 days ago

JonoYang commented 2 days ago

The purldb webserver experienced an error that killed the gunicorn worker handling this request:

web-1  | [2024-07-02 21:31:42 +0000] [9] [CRITICAL] WORKER TIMEOUT (pid:10)
web-1  | [2024-07-02 21:31:42 +0000] [10] [ERROR] Error handling request /api/collect/index_packages/
web-1  | Traceback (most recent call last):
web-1  |   File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 135, in handle
web-1  |     self.handle_request(listener, req, client, addr)
web-1  |   File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
web-1  |     respiter = self.wsgi(environ, resp.start_response)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/wsgi.py", line 124, in __call__
web-1  |     response = self.get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 140, in get_response
web-1  |     response = self._middleware_chain(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
web-1  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
web-1  |     return func(*args, **kwds)
web-1  |            ^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 65, in _view_wrapper
web-1  |     return view_func(request, *args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/viewsets.py", line 124, in view
web-1  |     return self.dispatch(request, *args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch
web-1  |     response = handler(request, *args, **kwargs)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/app/packagedb/api.py", line 973, in index_packages
web-1  |     get_source_package_and_add_to_package_set(package)
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 141, in get_source_package_and_add_to_package_set
web-1  |     source_purl = get_source_repo(package=package)
web-1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 198, in get_source_repo
web-1  |     repo_urls = list(get_repo_urls(package))
web-1  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 225, in get_repo_urls
web-1  |     source_urls = get_source_urls_from_package_data_and_resources(
web-1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 244, in get_source_urls_from_package_data_and_resources
web-1  |     metadata_urls = list(get_urls_from_package_data(package))
web-1  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 345, in get_urls_from_package_data
web-1  |     found_urls.extend(get_urls_from_text(text=homepage_text))
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 36, in get_urls_from_text
web-1  |     for url in get_urls_from_location(location=lines)["urls"]:
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/scancode/api.py", line 134, in get_urls
web-1  |     for urls, line_num in found_urls:
web-1  |   File "/usr/local/lib/python3.11/site-packages/scancode/api.py", line 130, in <genexpr>
web-1  |     found_urls = ((u, ln) for (u, ln) in find_urls(location) if u)
web-1  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 257, in find_urls
web-1  |     for _key, url, _line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 78, in unique_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 576, in junk_urls_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 553, in junk_url_hosts_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 425, in canonical_url_cleaner
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 108, in re_filt
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 360, in user_pass_cleaning_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 336, in scheme_adder
web-1  |     yield key, match, line, line_number
web-1  |   File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
web-1  |     sys.exit(1)
web-1  | SystemExit: 1
web-1  | [2024-07-02 21:31:42 +0000] [10] [INFO] Worker exiting (pid: 10)
web-1  | [2024-07-02 21:31:43 +0000] [9] [ERROR] Worker (pid:10) was sent SIGKILL! Perhaps out of memory?

My initial guess is that there may be a regex explosion happening when parsing urls