pulp / pulpcore

Pulp 3 pulpcore package https://pypi.org/project/pulpcore/
GNU General Public License v2.0
289 stars 113 forks source link

Intermittent 403 errors when using S3 storage backend #3077

Open vonsch opened 2 years ago

vonsch commented 2 years ago

Version

  "versions": [
    {
      "component": "core",
      "version": "3.20.0",
      "package": "pulpcore"
    },
    {
      "component": "rpm",
      "version": "3.17.9",
      "package": "pulp-rpm"
    }
  ],

We reproduced this issue with both pulpcore 3.17.3 and 3.20.0 with redis caching enabled.

Describe the bug We use S3 storage as pulpcore backend (DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'). However time-to-time our pulp clients facing issue that downloads of artifacts ends up with 403 errors, like:

  - '[   74.382330] cloud-init[1320]: Errors during downloading metadata for repository ''redacted'':'
  - '[   74.383010] cloud-init[1320]:   - Status code: 403 for http://xxxxxxx/repodata/repomd.xml (IP: 10.14.124.15)'

After extensive debugging, root cause seems to be the way how returned AWS S3 pre-signed URLs are cached. Following happens, based on my observation:

  1. pulpcore calls django_storage S3 method to get pre-signed URL for the S3 object, which will be returned to the client. By default, this pre-signed URL is valid for 3600 seconds per django documentation (https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html , check AWS_QUERYSTRING_EXPIRE option)
  2. django_storage (and underlying boto3) caches the presigned URL for 1 hour (3600) and returns the same URL for every request.
  3. right before presigned URL expiration in boto3 (let's say after 3599 seconds), it might happen that pulpcore code receives this nearly-expired presigned URL from django/boto3 and caches it for another 10 minutes in redis. And during this period (cached already-expired presigned URL), every link returned from pulp to clients returns 403:
    #curl -L -vvv 'https://pulp_url/centos/8-stream/baseos/x86_64/AppStream/repodata/repomd.xml'
    ...
    <Error><Code>AccessDenied</Code><Message>Request has expired</Message><X-Amz-Expires>60</X-Amz-Expires><Expires>2022-08-15T14:35:49Z</Expires><ServerTime>2022-08-15T14:39:36Z</ServerTime>
    ...

To Reproduce

  1. Setup pulpcore with S3 backend
  2. Enable redis caching
  3. Decrease TTL of pre-signed URLs to just 60 seconds (you can also leave this setting to default, but you will then have to wait 59 minutes before pre-signed URL expires)

settings.py snippet:

# S3 based storage
MEDIA_ROOT = ''
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
AWS_ACCESS_KEY_ID = 'xxxxx'
AWS_SECRET_ACCESS_KEY = 'xxxxx'
AWS_STORAGE_BUCKET_NAME = 'xxxxx'
AWS_DEFAULT_ACL = "@none None"
S3_USE_SIGV4 = True
AWS_S3_SIGNATURE_VERSION = "s3v4"
AWS_S3_ADDRESSING_STYLE = "path"
AWS_S3_REGION_NAME = "us-east-1"

# Configure django to return just 60-sec presigned URLs
AWS_QUERYSTRING_EXPIRE = 60

# Enable usage of Redis
CACHE_ENABLED = False
REDIS_HOST = "localhost"
REDIS_PORT = 6379

Start to query pulp for some artifact stored in S3, and after one minute, it will start to return expired presigned URLs.

Expected behavior Pulp doesn't return invalid presigned URLs

Additional context Please let me know if you need more information.

ipanova commented 1 year ago

In addition to the storage pre-signed urls expiration we have also added the ContentRedirectingContentGuard https://github.com/pulp/pulpcore/issues/3238 url expiration. Whenever the cache is enabled, it is always a risk cached data can get outdated at any moment. One way how to handle this situation is to have is reasonable timings where caching_time < url_validity However with some url expiration time we have no control over, like the AWS_QUERYSTRING_EXPIRE. Maybe we can document this sort of aspect for the setups that have cache enabled.

@gerrod3 got any ideas if it is possible to ensure what's stored in cache is not expired?

mdellweg commented 1 year ago

Another idea: Is it possible to resign urls after fetching from caches? Or do the whole redirect after caching altogether?

gerrod3 commented 1 year ago

Taking a look at the url generation code for azure and [0] [1], I think we can add some more custom logic to the cache to handle redirect urls with expiration time. The cache api currently just sets all cached requests to the same default expiration time [2]. We can take the expiration time from the storage objects at url formation time and pass it along with the redirect response so that cache knows to use a different expiration time. For the redirect guard we know the format of the url and can pass the expiration time over to the cache to ensure it is properly removed.

As for resigning urls, it would be possible if we store all the requirements to perform the signing, but it would require a database lookup since we need to retrieve the artifact to get its storage to form the url. [3]

[0] https://github.com/jschneier/django-storages/blob/master/storages/backends/azure_storage.py#L306 [1] https://github.com/jschneier/django-storages/blob/master/storages/backends/s3boto3.py#L574 [2] https://github.com/pulp/pulpcore/blob/main/pulpcore/cache/cache.py#L406 [3] https://github.com/pulp/pulpcore/blob/main/pulpcore/content/handler.py#L826-L831

bmx0r commented 7 months ago

I saw https://docs.pulpproject.org/pulpcore/configuration/settings.html#cache-settings Could we imagien to set this to a value lower than AWS_QUERYSTRING_EXPIRE to solve this issue?