stevearc / pypicloud

S3-backed pypi server implementation
MIT License
507 stars 141 forks source link

PyPI Cloud fails to serve request when fallback is unavailable even if cache can be used #298

Closed tiandrey closed 2 years ago

tiandrey commented 2 years ago

Our pypicloud installation has the following configuration (part of it):

pypi.fallback = cache
pypi.cache_update = everyone
pypi.always_show_upstream = True
pypi.fallback_base_url = https://pypi.org
pypi.use_json_scraper = True

Recently pypi.org has been feeling under the weather and was serving cyclical redirects like this:

$ curl -i https://pypi.org/pypi/SQLAlchemy/json
HTTP/2 301 
access-control-allow-headers: Content-Type, If-Match, If-Modified-Since, If-None-Match, If-Unmodified-Since
access-control-allow-methods: GET
access-control-allow-origin: *
access-control-expose-headers: X-PyPI-Last-Serial
access-control-max-age: 86400
cache-control: max-age=900, public
content-security-policy: base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ fastly-insights.com *.fastly-insights.com *.ethicalads.io https://api.pwnedpasswords.com/ https://2p66nmmycsj3.statuspage.io/; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.ingress.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com *.ethicalads.io; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com *.ethicalads.io 'sha256-U3hKDidudIaxBDEzwGJApJgPEf2mWk6cfMWghrAa6i0='; style-src 'self' fonts.googleapis.com *.ethicalads.io 'sha256-2YHqZokjiizkHi1Zt+6ar0XJ0OeEy/egBnlm+MDMtrM=' 'sha256-47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU='; worker-src *.fastly-insights.com
content-type: text/plain; charset=UTF-8
location: https://pypi.org/pypi/SQLAlchemy/json
referrer-policy: origin-when-cross-origin
server: nginx/1.13.9
accept-ranges: bytes
date: Fri, 10 Jun 2022 12:55:12 GMT
x-served-by: cache-iad-kiad7000058-IAD, cache-bma1683-BMA
x-cache: HIT, HIT
x-cache-hits: 147, 1
x-timer: S1654865713.716638,VS0,VE1
vary: Accept-Encoding
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-frame-options: deny
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
content-length: 118

301 Moved Permanently

The resource has been moved to /pypi/SQLAlchemy/json; you should be redirected automatically.

And this caused our installation to serve HTTP 500 with short notice Exceeded 30 redirects.. Stacktrace follows:

ERROR 2022-06-08 16:53:47,920 [pypicloud.views] Exceeded 30 redirects.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/util.py", line 190, in __getitem__
    value = super(TimedCache, self).__getitem__(key)
KeyError: 'sqlalchemy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pyramid/tweens.py", line 41, in excview_tween
    response = handler(request)
  File "/usr/local/lib/python3.5/dist-packages/pyramid/router.py", line 148, in handle_request
    registry, request, context, context_iface, view_name
  File "/usr/local/lib/python3.5/dist-packages/pyramid/view.py", line 683, in _call_view
    response = view_callable(context, request)
  File "/usr/local/lib/python3.5/dist-packages/pyramid/config/views.py", line 188, in attr_view
    return view(context, request)
  File "/usr/local/lib/python3.5/dist-packages/pyramid/config/views.py", line 214, in predicate_wrapper
    return view(context, request)
  File "/usr/local/lib/python3.5/dist-packages/pyramid/viewderivers.py", line 436, in rendered_view
    result = view(context, request)
  File "/usr/local/lib/python3.5/dist-packages/pyramid_duh/view.py", line 181, in slash_redirect
    return fxn(*args)
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/views/simple.py", line 106, in package_versions
    return _package_versions(context, request)
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/views/simple.py", line 90, in _package_versions
    return _simple_cache_always_show(context, request)
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/views/simple.py", line 299, in _simple_cache_always_show
    pkgs = get_fallback_packages(request, context.name, False)
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/views/simple.py", line 155, in get_fallback_packages
    releases = request.locator.get_releases(package_name)
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/locator.py", line 23, in get_releases
    return self._cache[project_name]
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/util.py", line 193, in __getitem__
    value = self._factory(key)
  File "/usr/local/lib/python3.5/dist-packages/pypicloud/locator.py", line 27, in _get_releases
    response = requests.get(url)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in <listcomp>
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 111, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
[pid: 2470809|app: 0|req: 86506/8530741] 127.0.0.1 () {52 vars in 670 bytes} [Wed Jun  8 16:53:44 2022] GET /pypi/sqlalchemy/ => generated 166 bytes in 3205 msecs (HTTP/1.0 500) 2 headers in 99 bytes (2 switches on core 0)

I think the problem is that you don't catch all the exceptions in SimpleJsonLocator._get_releases:

    def _get_releases(self, project_name):
        url = "%s/pypi/%s/json" % (self.base_index, project_name)
        response = requests.get(url) # <-- exception is raised here
        try:
            response.raise_for_status()
        except requests.HTTPError as e:
            LOG.warning("Error fetching '%s' from upstream: %s", project_name, e)
            return []
tiandrey commented 2 years ago

https://github.com/stevearc/pypicloud/pull/299 seems to fix this issue for me.