Open kornelski opened 10 years ago
I have to check how codes having the same Huffman code length are stored in the file, I seem to recall that it's by increasing value and nothing related to there actual frequency (a part from having the same Huffman code length which comes from the fact that they have similar frequencies). i.e codes of length 5, 4 entries : 7 9 10 12 But if 12 appears a bit more frequently than 10 and unfortunately has a code ending with a lot of ones like 00111 it could be smart move to change the recording order to 7 9 12 10 and this time 12 would get a less "dangerous" 00110 code and 10 would get 00111.
I'm pretty sure the histogram of pairs will not work since there is often raw binary data in-between two Huffman codes.
0xFF bytes in Huffman data need special coding that adds another byte of overhead. If this overhead could be reliably minimized, then approximation of size if scan data described #15 would be more effective.
Here's an algorithm off the top of my head:
And/Or:
(code[symbol[n-1]], code[symbol[n]])
Likely it can be improved.