The following sensors {project="nbs", cluster="mycluster", service="service", host="cluster", sensor="*ompressedBytesWritten", type="ssd"} observed on our clusters show that the data that our users store can be compressed very well (x2.5 - x3 with lz4 codec). We can save a lot of space if we store compressed blobs in blobstorage. But a naive implementation - simply compressing whole blobs - will not work - if we compress a 4MiB blob and store the compressed form we will need to read and decompress the whole blob when the user decides to read a single 4KiB block from it. That's why we will need to do something like this:
Compaction should split each 4MiB blob into 103 40KiB chunks
we should try to compress each chunk - if the compression ratio is better than, say, x3, this chunk will be stored in a compressed form, otherwise - non-compressed
we should store chunk offsets in TBlobMeta to be able to find and read only the chunks that are required to process the request (upon receiving a read request)
Chunk size and min compression ratio should be configurable via TStorageServiceConfig. In the future more complex logic may be implemented: e.g. we can track read request sizes and dynamically change chunk size and min compression ratio.
The following sensors
{project="nbs", cluster="mycluster", service="service", host="cluster", sensor="*ompressedBytesWritten", type="ssd"}
observed on our clusters show that the data that our users store can be compressed very well (x2.5 - x3 with lz4 codec). We can save a lot of space if we store compressed blobs in blobstorage. But a naive implementation - simply compressing whole blobs - will not work - if we compress a 4MiB blob and store the compressed form we will need to read and decompress the whole blob when the user decides to read a single 4KiB block from it. That's why we will need to do something like this:Chunk size and min compression ratio should be configurable via TStorageServiceConfig. In the future more complex logic may be implemented: e.g. we can track read request sizes and dynamically change chunk size and min compression ratio.