Open kristoferlundgren opened 2 years ago
Thank you for writing up this issue in such detail.
So far, I've been unable to reproduce this bug using AWS. In my configuration, I've put a text file on my S3 bucket and ran curl against it in a loop.
I saw that the cache files were correctly populated in the /var/cache/nginx/s3_proxy
directory. I also monitored the instance for outbound connections via netstat and I only saw outbound connections every minute or so.
On my container, the contents of the cache directory look like:
root@88822b1c11cd:/var/cache/nginx/s3_proxy# find /var/cache/nginx/s3_proxy/
/var/cache/nginx/s3_proxy/
/var/cache/nginx/s3_proxy/1
/var/cache/nginx/s3_proxy/1/93
/var/cache/nginx/s3_proxy/1/93/b620bfa0e09b3cc11521660acb6e2931
I'll go and try to see if I can reproduce the issue on Google Cloud Storage.
I just ran the same configuration against Google Cloud Storage and I was able to reproduce the behavior.
I found the source of the issue. Google Cloud Storage diverges from the AWS S3 behavior by setting Cache-Control: private, max-age=0
by default for all objects. You need to edit the metadata for your object on Google Cloud Storage and change the value of Cache-Control
to public
in order to enable caching with the gateway. See the Cloud Storage Documentation for more information.
There may be a way to configure NGINX to ignore the header sent by Google Cloud Storage by using the proxy_ignore_headers
directive to ignore the Cache-Control
header.
Many thanks for tracking down the root cause of this issue.
As you (@dekobon ) suggested, I added proxy_ignore_headers Cache-Control;
to the http {} part of /etc/nginx/nginx.conf
, ran nginx -s reload
inside the container. And voilà, it works!
Files are now cached, as expected.
I now have some choices.
/etc/nginx/nginx.conf
into the container.proxy_ignore_headers Cache-Control;
as part of the config. Preferably configurable with an environment variable.I would like to first ask for no.3 . What are your thoughts?
Again, thanks!
I think asking for number three is reasonable. We may need a generalized way to accomplish this because we also need to solve for #65 .
I've made some updates to the container so that you can now layer in additional NGINX configuration. See the documentation.
Also, I added a feature that allows you to strip out headers from the client response. For Google Cloud Storage you will want to do:
HEADER_PREFIXES_TO_STRIP=x-goog-;x-guploader-uploadid
Please let me know if this solution works for you. If it does, I'll mark this issue as closed.
Trying the new feature by added the Cache-Control header:
HEADER_PREFIXES_TO_STRIP="x-goog-;x-guploader-uploadid;Cache-Control"
Resulted in the error:
HEADER_PREFIXES_TO_STRIP must not contain uppercase characters
(as documented)
Second try (lowercase Cache-Control):
HEADER_PREFIXES_TO_STRIP="x-goog-;x-guploader-uploadid;cache-control"
Downloaded some files and then checked the cache directory. -Empty, i.e. Cache is still disabled.
Third try: (stripping x-goog headers and mounting nginx http config file)
docker run --rm -ti -p 80:80 -e S3_SERVER=storage.googleapis.com -e S3_ACCESS_KEY_ID="<key>" -e S3_SECRET_KEY="<secret>" -e HEADER_PREFIXES_TO_STRIP="x-goog-;x-guploader-uploadid" --env-file s3.env -v $(pwd)/cache.conf:/etc/nginx/conf.d/cache.conf nginxinc/nginx-s3-gateway:latest
Where the $(pwd)/cache.conf file contains:
proxy_ignore_headers Cache-Control;
Downloaded some files and then checked the cache directory. Cache directory has content.
I.e. Cache is working! :)
I would have preferred an environment variable solution, but this config works as well. Many thanks for the assessment and quick remediation of this issue. And also reporting and fixing #65.
Before closing this issue I believe the need for proxy_ignore_headers Cache-Control;
ought to be documented to aid usage when s3 backends (ex. Google Cloud Storage) emit caching preferences.
I agree it should be documented. Also, we may want to add an environment variable that allows for ignoring cache control, but I wanted to get the extensibility part done ASAP because we've gotten a lot of requests for similar things and the number of environment variables is starting to add up.
I'll leave this issue open until we can add a setting.
I made a stupid mistake of exec into the wrong running container with the same name so i didnt find any cache check if this also might be the reason
I've just experienced this issue, and in addition to ignoring the Cache-Control
header, I also had to ignore the Expires
header for it to work:
proxy_ignore_headers Cache-Control;
proxy_ignore_headers Expires;
@dekobon @4141done You two seem to be the current maintainers. I really appreciate your effort to keep the project alive!
From reading various discussions on the subject of caching in this GitHub project, there seems to be a general request to have more control of the ingress and egress cache configuration. Mounting my own cache.conf, replacing the default, still seems like a hack. Is there a more intuitive way to manage cache configuration, or can one be developed with a reasonable effort?
Describe the bug Using the latest Docker image, no data is being cached.
To Reproduce Steps to reproduce the behavior:
docker run --rm -ti -p 80:80 -e S3_SERVER=storage.googleapis.com -e S3_ACCESS_KEY_ID="<key>" -e S3_SECRET_KEY="<secret>" --env-file s3.env nginxinc/nginx-s3-gateway:latest-20221026
s3.env
file:I can successfully browse the S3 bucket directory structure and download objects without any issue. Although, when downloading the same object multiple times I cannot see any performance increase from a cache hit.
docker exec -ti <container> bash
Run the following command:ls -la /var/cache/nginx/s3_proxy/
The cache directory is empty. I also looked for looked for any disk usage increase with the command
du -sh /*
but no cached data is being stored in the container.Expected behavior According to the documentation, data should be cached when accessed multiple times and not reloaded from the remote S3 bucket at each access.
Your environment