google / brotli

Brotli compression format
MIT License
13.3k stars 1.22k forks source link

Custom dictionaries below level 5 #1148

Open rachel-bousfield opened 3 months ago

rachel-bousfield commented 3 months ago

Custom dictionaries can be attached during compression and decompression using C APIs like BrotliEncoderAttachPreparedDictionary. However, it appears they aren't used below brotli level 5. This causes a kind of silent failure where the user doesn't observe the lack of improvement.

There's a few solutions to this issue

  1. The attach and/or prepare methods could document the behavior.
  2. These and BrotliEncoderPrepareDictionary could fail when an incompatible level is applied.
  3. The dictionary format could include the minimum brotli level for compatibility & API-simplification reasons.
  4. Decide this is a bug and implement the feature for lower levels (not backwards compatible)

Here's an example in case the issue isn't clear

brotli -0 -D dictionary.lz -o dict
brotli -0 no-dict
diff dict no-dict    # would use <() but this doesn't work either 
eustas commented 3 months ago

Thanks for reporting. I'll check and take action when I get spare cycles. Agree, that at least CLI should let users know if dictionary is ignored.

rachel-bousfield commented 3 months ago

I'd also look into how the C API could be improved. That's actually how I first discovered this issue. I was rather confused why my dictionary wasn't working :)

pmeenan commented 2 months ago

FWIW, ZStandard allows for dictionary use down to 0 (though, practically you don't see big benefit until 2-3).

I don't know enough about brotli's actual encoding to know if it makes sense, but it could be useful to allow for dictionaries to work at the lower levels as well if that allows for lower CPU use but still decent savings.