vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.46k stars 1.53k forks source link

Add support for compression levels to file sink #14689

Open bruceg opened 1 year ago

bruceg commented 1 year ago

A note for the community

Use Cases

The file sink has configuration for the algorithm used to compress the data before writing to the files, but not for the associated parameters of the algorithm, notably the compression "level". Users of the file sink want to choose the trade offs of CPU time versus file size, which can only be accomplished by allowing for customizing the compression level.

Attempted Solutions

A user contribution was provided to solve this issue in #14349, but we decided to go with a different approach.

Proposal

The sink batch buffer handling has support for compression level configuration. We can use that structure to configure the file sink as well to ensure a unified UX for these components. That the batch buffer code is missing the Zstandard compression that is already supported here, so this depends on the completion of #2302.

References

No response

Version

No response

andrey-mazo commented 1 month ago

The file sink has configuration for the algorithm used to compress the data before writing to the files, but not for the associated parameters of the algorithm, notably the compression "level". Users of the file sink want to choose the trade offs of CPU time versus file size, which can only be accomplished by allowing for customizing the compression level.

I just wanted to add that this is the case for other sinks as well, like S3. (as far as I understand) And yeah, it'd be great to be able to set compression level for S3 sink too.

jszwedko commented 1 month ago

This isn't documented but it is actually possible to configure the compression level on the aws_s3 sink via:

compression:
  algorithm: "gzip"
  level: 5

It's difficult to document these sort of options that can be multiple types (in this case string and map) on the website which is why it is currently missing.

The file sink does not support this yet though since it uses its own configuration struct for the compression field.

andrey-mazo commented 1 month ago

This isn't documented but it is actually possible to configure the compression level on the aws_s3 sink via:

Jesse, oh, wow, thank you for pointing this out! You saved my day! I think I actually tried something similar, but apparently botched up the syntax.