oxyno-zeta / s3-proxy

S3 Reverse Proxy with GET, PUT and DELETE methods and authentication (OpenID Connect and Basic Auth)
https://oxyno-zeta.github.io/s3-proxy/
Apache License 2.0
295 stars 33 forks source link

Proxy pre-compressed data as-is #358

Closed aukaost closed 1 year ago

aukaost commented 1 year ago

Hi, Thanks for this project! If you take a file which is gzipped and has Content-Encoding set to gzip on the object's meta data it seems to be served uncompressed. Is it possible to proxy those objects as-is?

Thanks!

oxyno-zeta commented 1 year ago

Hello @aukaost ,

Happy to see that this project is interesting for multiple people ;)

I'm afraid that this coming from your browser when you download the file. I've already seen this when I used the real S3 interface. Did you try to manually change the content-type in S3 and test ? If this is working, it means that this is coming from your browser and you could use the header override feature to avoid this behavior. If not, we will need to check more in details ;)

Oxyno-zeta

aukaost commented 1 year ago

I'll add some more detail :) Our assets are all pre-compressed in the bucket and we do want the browser to do its thing via Accept-Encoding and transparently decompress the assets in the browser. This is all working as intended when running using curl, e.g. curl https://<bucket>.s3.us-west-1.amazonaws.com/object.txt -H "Accept-Encoding: gzip" will return a compressed object with a Content-Encoding: gzip header. If I request the same asset through s3-proxy, the result is a decompressed object with no Content-Encoding header. So it seems that the result from the request between s3-proxy and s3 gets decompressed and then served uncompressed. We could enable compression on s3-proxy but that would mean decompressing and re-compressing without need.

oxyno-zeta commented 1 year ago

Interesting...

I will need a bit of time to debug this. I wonder if this isn't coming from the AWS client itself.

But thanks for all those details !

aukaost commented 1 year ago

I'll also have a look, just wanted to check if I was missing anything before I do!

aukaost commented 1 year ago

https://github.com/aws/aws-sdk-go/issues/1292#issuecomment-303550435 https://github.com/oxyno-zeta/s3-proxy/blob/master/pkg/s3-proxy/s3client/s3Context.go#L226

    obj, err := s3ctx.svcClient.GetObjectWithContext(ctx, s3Input,
        func(r *request.Request) {
            // Comment out to have transport decode the object.
            r.HTTPRequest.Header.Add("Accept-Encoding", "gzip")
        })

Tested s3-proxy with this modification and that seems to do the job. I suppose the proper way would be to forward the client request's Accept-Encoding to GetObjectWithContext.

➜  ~ curl http://127.0.0.1:8080/hello.txt -vs | gunzip
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /hello.txt HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: no-cache, no-store, no-transform, must-revalidate, private, max-age=0
< Content-Encoding: gzip
< Content-Length: 35
< Content-Type: text/plain
< Etag: "2ab3bdddd9d55d30c05b22c08658c436"
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Last-Modified: Tue, 02 May 2023 17:22:07 GMT
< Pragma: no-cache
< X-Accel-Expires: 0
< Date: Wed, 03 May 2023 08:13:14 GMT
<
{ [35 bytes data]
* Connection #0 to host 127.0.0.1 left intact
hello%
oxyno-zeta commented 1 year ago

Thanks for your input and tests !

Now that we know how to patch it, I need to think about a way to add configuration to allow people sending whatever they want :) . Thanks a lot for this new feature "spec" :smile: .

oxyno-zeta commented 1 year ago

@aukaost I spend few hours to develop the feature and I was on the test phase and can't reproduce your problem...

With this example:

echo "Fake" > file.txt
gzip file.txt
aws s3 cp file.txt.gz s3://TARGET/file.txt.gz

Making those requests:

curl http://localhost:8080/mm/file.txt.gz -vs | gunzip
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /mm/file.txt.gz HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: must-revalidate, max-age=0
< Content-Length: 34
< Content-Type: text/plain
< Etag: "6d1220d5af136cb2cc4fa8bc51dde518"
< Last-Modified: Wed, 03 May 2023 21:07:50 GMT
< Vary: Origin
< Date: Wed, 03 May 2023 21:13:47 GMT
< 
{ [34 bytes data]
* Connection #0 to host localhost left intact
Fake
 curl http://localhost:8080/mm/file.txt.gz -vs | cat 
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /mm/file-gz.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: must-revalidate, max-age=0
< Content-Length: 34
< Content-Type: text/plain
< Etag: "6d1220d5af136cb2cc4fa8bc51dde518"
< Last-Modified: Wed, 03 May 2023 21:17:33 GMT
< Vary: Origin
< Date: Wed, 03 May 2023 21:17:50 GMT
< 
{ [34 bytes data]
* Connection #0 to host localhost left intact
��Rdfile.txtsK�N����.% 

So... Seems that everything is intact.

Am I missing something ?

Can you give me a procedure, a configuration and the version you are using please ?

aukaost commented 1 year ago

image You need to update the s3 object's metadata to include Content-Encoding: gzip. This way s3 will return this header and will make the s3client decompress the object transparently.

I've been testing with 4.8.0, the config is very basic:

targets:
  sourcemaps:
    mount:
      path:
        - /
    bucket:
      name: <bucket>
      region: us-west-1
server:
  listenAddr: ""
  port: 80
oxyno-zeta commented 1 year ago

Ahhh that explains why I wasn't able to reproduce it.

Now I am ! And I'm able to make it work as desired too. But I'm not able to reproduce it in tests due to the fact that I'm using a fake S3 in-memory... I will check how difficult will it be to migrate to minio for tests.

In all cases, I will probably release early next week.

aukaost commented 1 year ago

That's really awesome, thanks for that :)

oxyno-zeta commented 1 year ago

I've tested to switch to Minio for tests but it will required more work than expected...

I've decided to release as is the code.

4.9.0 is released with your needed patch ! :tada:

Have a good rest of your day,

Oxyno-zeta

aukaost commented 1 year ago

Works like a charm, thanks a lot!

oxyno-zeta commented 1 year ago

@aukaost I'm very happy to see that and you are welcome!

Do not hesitate if you see a bug or a new feature idea.