nginxinc / nginx-s3-gateway

NGINX S3 Caching Gateway
Apache License 2.0
515 stars 127 forks source link

400 Bad Request Response when File contains spaces. #218

Open NicoGartmann opened 8 months ago

NicoGartmann commented 8 months ago

Describe the bug We have an HTML file with spaces in an S3 bucket. When we try to access it, NGINX reports a 400 - Bad Request. In the logs of the nginx-s3-gateway pod, we could see that the request is sent to NGINX encoded. However, it looks like the encoding is removed again at proxy_pass

To Reproduce Steps to reproduce the behavior:

  1. Put a HTML File in a AWS S3 Bucket containing spaces.
  2. Try to access it through nginx-s3-gateway

Expected behavior We expect the file to be delivered.

Your environment

Additional context

2024/02/27 10:43:45 [info] 76#76: *168 client sent invalid request while reading client request line, client: 10.0.0.0, server: , request: "GET /scorm/5f1b1eb1/Lernmodul 5 - Fußleisten anbringen.html HTTP/1.0"
10.0.0.0 - - [27/Feb/2024:10:43:45 +0000] "GET /scorm/5f1b1eb1/Lernmodul 5 - Fu\xC3\x9Fleisten anbringen.html HTTP/1.0" 400 150 "-" "-" 0 0.000 - - - -
4141done commented 8 months ago

Thank you for your report 👍 I've received it and will take a look soon.

4141done commented 7 months ago

Hello, thank you for your patience. I added some integration tests using the file name and have not be able to reproduce your issue locally. I'm going to try setting up it up more formally in S3 but let me show you what I'm checking to see if I'm missing anything:

Setup: File called Lernmodul 5 - Fußleisten anbringen.html at the root of my bucket.

Test request: curl localhost:8989/Lernmodul%205%20-%20Fu%C3%9Fleisten%20anbringen.html

I set up an echo server in the place of the S3 gateway to see what was being received by s3:

curl localhost:8989/Lernmodul%205%20-%20Fu%C3%9Fleisten%20anbringen.html
{
  "path": "/Lernmodul%205%20-%20Fu%C3%9Fleisten%20anbringen.html",
  "headers": {
    "x-amz-date": "20240314T222457Z",
    "x-amz-content-sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "authorization": "AWS4-HMAC-SHA256 Credential=AKIAIOSFODNN7EXAMPLE/20240314/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=c9f38a25d24ac1d3ca7b0204a15425132758d74048b6bb63daa858454708727b",
    "host": "bucket-1.echoer"
  },
  "method": "GET",
  "body": "",
  "fresh": false,
  "hostname": "bucket-1.echoer",
  "ip": "::ffff:172.18.0.5",
  "ips": [],
  "protocol": "http",
  "query": {},
  "subdomains": [],
  "xhr": false,
  "os": {
    "hostname": "5b8cee3fdf36"
  },
  "connection": {}
}

So from this basic test it does not look to me like the gateway itself is losing the encoding. When run against the test minio server the file content comes back as expected.

Please let me know if you see anything that could bring me closer to reproducing your issue. If you haven't already, you may want to try turning on DEBUG=true when running the gateway to get some more detailed information about the path before it's passed to S3.