Open vonsch opened 2 years ago
In addition to the storage pre-signed urls expiration we have also added the ContentRedirectingContentGuard
https://github.com/pulp/pulpcore/issues/3238 url expiration.
Whenever the cache is enabled, it is always a risk cached data can get outdated at any moment.
One way how to handle this situation is to have is reasonable timings where caching_time
< url_validity
However with some url expiration time we have no control over, like the AWS_QUERYSTRING_EXPIRE
. Maybe we can document this sort of aspect for the setups that have cache enabled.
@gerrod3 got any ideas if it is possible to ensure what's stored in cache is not expired?
Another idea: Is it possible to resign urls after fetching from caches? Or do the whole redirect after caching altogether?
Taking a look at the url generation code for azure and [0] [1], I think we can add some more custom logic to the cache to handle redirect urls with expiration time. The cache api currently just sets all cached requests to the same default expiration time [2]. We can take the expiration time from the storage objects at url formation time and pass it along with the redirect response so that cache knows to use a different expiration time. For the redirect guard we know the format of the url and can pass the expiration time over to the cache to ensure it is properly removed.
As for resigning urls, it would be possible if we store all the requirements to perform the signing, but it would require a database lookup since we need to retrieve the artifact to get its storage to form the url. [3]
[0] https://github.com/jschneier/django-storages/blob/master/storages/backends/azure_storage.py#L306 [1] https://github.com/jschneier/django-storages/blob/master/storages/backends/s3boto3.py#L574 [2] https://github.com/pulp/pulpcore/blob/main/pulpcore/cache/cache.py#L406 [3] https://github.com/pulp/pulpcore/blob/main/pulpcore/content/handler.py#L826-L831
I saw https://docs.pulpproject.org/pulpcore/configuration/settings.html#cache-settings Could we imagien to set this to a value lower than AWS_QUERYSTRING_EXPIRE to solve this issue?
Version
We reproduced this issue with both pulpcore 3.17.3 and 3.20.0 with redis caching enabled.
Describe the bug We use S3 storage as pulpcore backend (
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
). However time-to-time our pulp clients facing issue that downloads of artifacts ends up with 403 errors, like:After extensive debugging, root cause seems to be the way how returned AWS S3 pre-signed URLs are cached. Following happens, based on my observation:
3600 seconds
per django documentation (https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html , checkAWS_QUERYSTRING_EXPIRE
option)To Reproduce
settings.py
snippet:Start to query pulp for some artifact stored in S3, and after one minute, it will start to return expired presigned URLs.
Expected behavior Pulp doesn't return invalid presigned URLs
Additional context Please let me know if you need more information.