nginxinc / nginx-s3-gateway

NGINX S3 Caching Gateway
Apache License 2.0
496 stars 126 forks source link

Path style breaks fetching INDEX_PAGE #210

Open HighOnMikey opened 7 months ago

HighOnMikey commented 7 months ago

loadContent will always attempt fetching the INDEX_PAGE by building the URI path using s3uri().

When using S3_STYLE = 'path', this will prepend the URI path with the S3 bucket. However, the fetch is made against nginx itself, meaning the URI path will always be incorrect for the requested index.

To Reproduce

Set the following variables:

ALLOW_DIRECTORY_LIST=true
PROVIDE_INDEX_PAGE=true
S3_STYLE=path

Expected behavior

Incorrect URI currently being built: http://127.0.0.1/<bucket_name>/<request_path>/<INDEX_PAGE>

Expected URI: http://127.0.0.1/<request_path>/<INDEX_PAGE>

Environments Tested

Workaround

I would submit a PR but I'm not sure which direction would be preferred by the project to go about building the URI path. However, here's a quick fix to demonstrate the issue:

From

const uri = s3uri(r);

To

let uri;
if (S3_STYLE === 'path') {
    uri = `${r.uri}${INDEX_PAGE}`;
} else {
    uri = s3uri(r);
}
4141done commented 7 months ago

Thank you for the report @HighOnMikey , I'm a new maintainer on this project so let me take some time to understand that specific area more deeply and reply.

4141done commented 7 months ago

Hi @HighOnMikey thank you for your patience. I took a look at this and want to be sure that I understand the issue. It's possible I'm misunderstanding some part of your report:

  1. S3 path style requests expect the request in the form https://s3.region-code.amazonaws.com/bucket-name/key-name
  2. The PROVIDE_INDEX_PAGE configuration option is intended to serve a page called index.html from the requested directory. For example, if I have a bucket called test-bucket, then a request to <s3 gateway host>/ would be expected to be proxied to s3 as https://s3.region-code.amazonaws.com/test-bucket/index.html

I think that's the shared understanding we need. Testing the code I didn't see any indication that it's not doing that but I could be missing a key part of your report so just wanting to be sure.

Here are some logs from my local testing requesting /foo with the configuration you specified (I added some additional debug prints):

2024/02/15 22:00:15 [info] 75#75: *35 js: Using path style uri : /bucket-1
2024/02/15 22:00:15 [info] 75#75: *35 js: s3uri: Internal: true | uri: / | uriPath: / | basePath: /bucket-1
2024/02/15 22:00:15 [info] 75#75: *35 js: S3 Request URI: GET /bucket-1/index.html
2024/02/15 22:00:15 [info] 75#75: *37 js: Using path style uri : /bucket-1
2024/02/15 22:00:15 [info] 75#75: *37 js: s3uri: Internal: false | uri: /bucket-1/index.html | uriPath: /bucket-1/index.html | basePath: /bucket-1
2024/02/15 22:00:15 [info] 75#75: *37 js: S3 Request URI: GET /bucket-1/bucket-1/index.html
127.0.0.1 - - [15/Feb/2024:22:00:15 +0000] "GET /bucket-1/index.html HTTP/1.1" 404 146 "-" "-" "-" "/bucket-1/bucket-1/index.html" "/bucket-1/index.html"
2024/02/15 22:00:15 [info] 75#75: *35 js: Using path style uri : /bucket-1
2024/02/15 22:00:15 [info] 75#75: *35 js: s3uri: Internal: true | uri: / | uriPath: / | basePath: /bucket-1
2024/02/15 22:00:15 [info] 75#75: *35 js: S3 Request URI: GET /bucket-1?delimiter=%2F
192.168.65.1 - - [15/Feb/2024:22:00:15 +0000] "GET / HTTP/1.1" 200 1100 "-" "curl/8.4.0" "-" "/bucket-1?delimiter=%2F" "/"

I think I see the error here, when we redirect to /bucket-name/index.html the helper prepends the bucket name again: 2024/02/15 22:00:15 [info] 75#75: *37 js: S3 Request URI: GET /bucket-1/bucket-1/index.html

Is this the bug you're referring to or have I found a different bug? 😓

HighOnMikey commented 6 months ago

First of all, sorry for the delayed response!

I think that's the shared understanding we need.

Correct. The core functionality of path style requests works as intended. The bug is limited to attempting to find an index page when PROVIDE_INDEX_PAGE is enabled and S3_STYLE=path.

Is this the bug you're referring to or have I found a different bug? 😓

Yes, this is directly related to the bug. The loadContent function will use the local nginx proxy to test fetch to see if an index page exists, but it uses the s3uri function which is designed for building a URI to request from the S3 API directly.

4141done commented 6 months ago

Thanks for confirming and for your patience. I'm going to try to put together a fix in the coming days. I'll tag you for a test/review when it's up.

4141done commented 4 months ago

Hey @HighOnMikey sorry again for the slowness on this. Can you take a look at https://github.com/nginxinc/nginx-s3-gateway/pull/230 and see if it solves your issue?

Let me know if you need a container image built for you to test.