Before diving into what the enhancement is that I am asking for, let me explain a little context for why I am asking for it.
Currently when turning on gzip in the pipeline configuration, the actual Content-Type is changed from text/plain to application/gzip and the .gz file extension is appended to the filename uploaded to GCS. This is surely fine for many use cases, but assumes that the client can properly deal with decoding gzip files, which isn't always true.
With the current set up, any clients downloading the content must add an extra explicit step of decompressing the data, whereas with simply changing the Content-Encoding, the decompression would be implicit. Changing only the Content-Encoding gives the benefit of reduced storage and transmission costs, while allowing any receiving clients to simply think of the actual content as still just being a plain text .log file (which is what it really is). This works much more inline with our use cases with the data. It also allows decompression to happen server side, if the client indicates it can't handle gzip encoding, further increasing flexibility in serving the content.
Since changing the semantics of the gzip setting at this point would surely break existing deployments that depend on the current behavior, I suggest adding another setting to control this. For instance, something along the lines of gzip_content_encoding with a default value of false should work.
I am happy to add this feature and create a pull request for it, but I thought I should open an issue first to see whether something like this would even be considered for merging or not.
Before diving into what the enhancement is that I am asking for, let me explain a little context for why I am asking for it.
Currently when turning on
gzip
in the pipeline configuration, the actualContent-Type
is changed fromtext/plain
toapplication/gzip
and the.gz
file extension is appended to the filename uploaded to GCS. This is surely fine for many use cases, but assumes that the client can properly deal with decodinggzip
files, which isn't always true.With the current set up, any clients downloading the content must add an extra explicit step of decompressing the data, whereas with simply changing the
Content-Encoding
, the decompression would be implicit. Changing only theContent-Encoding
gives the benefit of reduced storage and transmission costs, while allowing any receiving clients to simply think of the actual content as still just being a plain text.log
file (which is what it really is). This works much more inline with our use cases with the data. It also allows decompression to happen server side, if the client indicates it can't handlegzip
encoding, further increasing flexibility in serving the content.Since changing the semantics of the
gzip
setting at this point would surely break existing deployments that depend on the current behavior, I suggest adding another setting to control this. For instance, something along the lines ofgzip_content_encoding
with a default value offalse
should work.I am happy to add this feature and create a pull request for it, but I thought I should open an issue first to see whether something like this would even be considered for merging or not.