logstash-plugins / logstash-output-google_cloud_storage

Apache License 2.0
9 stars 26 forks source link

Use `gzip` for `Content-Encoding` instead of `Content-Type` #13

Closed eestrada closed 6 years ago

eestrada commented 7 years ago

Before diving into what the enhancement is that I am asking for, let me explain a little context for why I am asking for it.

Currently when turning on gzip in the pipeline configuration, the actual Content-Type is changed from text/plain to application/gzip and the .gz file extension is appended to the filename uploaded to GCS. This is surely fine for many use cases, but assumes that the client can properly deal with decoding gzip files, which isn't always true.

With the current set up, any clients downloading the content must add an extra explicit step of decompressing the data, whereas with simply changing the Content-Encoding, the decompression would be implicit. Changing only the Content-Encoding gives the benefit of reduced storage and transmission costs, while allowing any receiving clients to simply think of the actual content as still just being a plain text .log file (which is what it really is). This works much more inline with our use cases with the data. It also allows decompression to happen server side, if the client indicates it can't handle gzip encoding, further increasing flexibility in serving the content.

Since changing the semantics of the gzip setting at this point would surely break existing deployments that depend on the current behavior, I suggest adding another setting to control this. For instance, something along the lines of gzip_content_encoding with a default value of false should work.

I am happy to add this feature and create a pull request for it, but I thought I should open an issue first to see whether something like this would even be considered for merging or not.

josephlewis42 commented 6 years ago

Pushed to rubygems as 3.3.0