koverstreet / bcachefs

Other
643 stars 71 forks source link

[Proposal] Change zstd compression levels mapping #643

Open LRFLEW opened 6 months ago

LRFLEW commented 6 months ago

I noticed that bcachefs is mapping its compression levels (1-15) to zstd's compression levels (1-22) using floor(level * 3 / 2). This PR proposes to change this mapping to be direct, and use only zstd compression levels 1-15 (see later comments about the new mapping). My reasoning for this change is as follows:

If keeping access to these higher compression levels is a priority, then perhaps a non-linear mapping would be better suited. One option would be to map the lower levels directly to the zstd levels, and then have the higher levels skip every other compression level (clamped to <=22). This would provide a compromise between providing more low compression levels and maintaining access to the max compression level, all while addressing the above issues. Let me know if you'd rather this solution, and I can update the PR with it.

boomshroom commented 6 months ago

Since background_compression happens, well, in the background, it's not as important for it to be fast and it makes sense to let the system use spare background cycles getting the maximum possible compression for very infrequently accessed files. As such, being able to access level 22 for this purpose would be nice and I'd appreciate not losing it.

If we wanted a piecewise-linear mapping with a slope of 1 at (0,0) and a slope of 2 at (15,22), then they would meet at exactly (8,8), giving one extra point to the lower range.

LRFLEW commented 6 months ago

I noticed that piecewise-linear mapping while experimenting with it, and I do think that's a good option. The only issue I have with it is that levels 20-22 might want to be considered separate.

Levels >= 20, labeled --ultra, should be used with caution, as they require more memory. (source)

Because of that, I think level 19, the maximum non-ultra level, should be included in the mapping. As such, I would propose using the mapping points (0,0) and (14,19), giving an intersection point of (9,9). Then the top level 15 would be mapped to the top ultra level 22. The only possible issue I see is that it adds additional complexity to the mapping.

LRFLEW commented 5 months ago

I rebuilt the PR branch to resolve the conflicts from the force pushes to the master branch. I also went ahead and implemented my alternative mapping, but made it a separate commit so that they could be reviewed separately if there's still some debate on this.

I was looking up recommendations for zstd compression levels, and saw this in the AWS documentation:

We recommend using the default level of 3 or a level in the range from 6 to 9 for a reasonable tradeoff between compression speed and compressed data size. (source)

This seems like good evidence to me that having ZStd levels 1-9 available is a good idea. With the non-linear mapping I have, these nine levels are all directly mapped, and the higher levels are available using larger compression levels, up to 14 as the max non-ultra compression level and 15 as the max ultra compression level.