Closed aukaost closed 1 year ago
Hello @aukaost ,
Happy to see that this project is interesting for multiple people ;)
I'm afraid that this coming from your browser when you download the file. I've already seen this when I used the real S3 interface. Did you try to manually change the content-type in S3 and test ? If this is working, it means that this is coming from your browser and you could use the header override feature to avoid this behavior. If not, we will need to check more in details ;)
Oxyno-zeta
I'll add some more detail :)
Our assets are all pre-compressed in the bucket and we do want the browser to do its thing via Accept-Encoding and transparently decompress the assets in the browser. This is all working as intended when running using curl, e.g. curl https://<bucket>.s3.us-west-1.amazonaws.com/object.txt -H "Accept-Encoding: gzip"
will return a compressed object with a Content-Encoding: gzip
header.
If I request the same asset through s3-proxy, the result is a decompressed object with no Content-Encoding
header.
So it seems that the result from the request between s3-proxy and s3 gets decompressed and then served uncompressed.
We could enable compression on s3-proxy but that would mean decompressing and re-compressing without need.
Interesting...
I will need a bit of time to debug this. I wonder if this isn't coming from the AWS client itself.
But thanks for all those details !
I'll also have a look, just wanted to check if I was missing anything before I do!
https://github.com/aws/aws-sdk-go/issues/1292#issuecomment-303550435 https://github.com/oxyno-zeta/s3-proxy/blob/master/pkg/s3-proxy/s3client/s3Context.go#L226
obj, err := s3ctx.svcClient.GetObjectWithContext(ctx, s3Input,
func(r *request.Request) {
// Comment out to have transport decode the object.
r.HTTPRequest.Header.Add("Accept-Encoding", "gzip")
})
Tested s3-proxy with this modification and that seems to do the job.
I suppose the proper way would be to forward the client request's Accept-Encoding
to GetObjectWithContext
.
➜ ~ curl http://127.0.0.1:8080/hello.txt -vs | gunzip
* Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /hello.txt HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: no-cache, no-store, no-transform, must-revalidate, private, max-age=0
< Content-Encoding: gzip
< Content-Length: 35
< Content-Type: text/plain
< Etag: "2ab3bdddd9d55d30c05b22c08658c436"
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Last-Modified: Tue, 02 May 2023 17:22:07 GMT
< Pragma: no-cache
< X-Accel-Expires: 0
< Date: Wed, 03 May 2023 08:13:14 GMT
<
{ [35 bytes data]
* Connection #0 to host 127.0.0.1 left intact
hello%
Thanks for your input and tests !
Now that we know how to patch it, I need to think about a way to add configuration to allow people sending whatever they want :) . Thanks a lot for this new feature "spec" :smile: .
@aukaost I spend few hours to develop the feature and I was on the test phase and can't reproduce your problem...
With this example:
echo "Fake" > file.txt
gzip file.txt
aws s3 cp file.txt.gz s3://TARGET/file.txt.gz
Making those requests:
curl http://localhost:8080/mm/file.txt.gz -vs | gunzip
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /mm/file.txt.gz HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: must-revalidate, max-age=0
< Content-Length: 34
< Content-Type: text/plain
< Etag: "6d1220d5af136cb2cc4fa8bc51dde518"
< Last-Modified: Wed, 03 May 2023 21:07:50 GMT
< Vary: Origin
< Date: Wed, 03 May 2023 21:13:47 GMT
<
{ [34 bytes data]
* Connection #0 to host localhost left intact
Fake
curl http://localhost:8080/mm/file.txt.gz -vs | cat
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /mm/file-gz.txt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: must-revalidate, max-age=0
< Content-Length: 34
< Content-Type: text/plain
< Etag: "6d1220d5af136cb2cc4fa8bc51dde518"
< Last-Modified: Wed, 03 May 2023 21:17:33 GMT
< Vary: Origin
< Date: Wed, 03 May 2023 21:17:50 GMT
<
{ [34 bytes data]
* Connection #0 to host localhost left intact
��Rdfile.txtsK�N����.%
So... Seems that everything is intact.
Am I missing something ?
Can you give me a procedure, a configuration and the version you are using please ?
You need to update the s3 object's metadata to include Content-Encoding: gzip
.
This way s3 will return this header and will make the s3client decompress the object transparently.
I've been testing with 4.8.0, the config is very basic:
targets:
sourcemaps:
mount:
path:
- /
bucket:
name: <bucket>
region: us-west-1
server:
listenAddr: ""
port: 80
Ahhh that explains why I wasn't able to reproduce it.
Now I am ! And I'm able to make it work as desired too. But I'm not able to reproduce it in tests due to the fact that I'm using a fake S3 in-memory... I will check how difficult will it be to migrate to minio for tests.
In all cases, I will probably release early next week.
That's really awesome, thanks for that :)
I've tested to switch to Minio for tests but it will required more work than expected...
I've decided to release as is the code.
4.9.0 is released with your needed patch ! :tada:
Have a good rest of your day,
Oxyno-zeta
Works like a charm, thanks a lot!
@aukaost I'm very happy to see that and you are welcome!
Do not hesitate if you see a bug or a new feature idea.
Hi, Thanks for this project! If you take a file which is gzipped and has Content-Encoding set to gzip on the object's meta data it seems to be served uncompressed. Is it possible to proxy those objects as-is?
Thanks!