Closed filipmaric closed 5 years ago
Need to double-check myself
Seconded! My intuition behind this is that the numbers at any level must be larger or equal to those at the next level. This can be proved using an exchange argument. For example if we exchange nodes T with the node containing the sum 16, we are sure to get a full binary tree with a better overall cost.
Double-checked. @filipmaric is correct that 17 should be paired with 21, not 22. The correct number of bits for the message is 643.
(The minimum cost 649 that @filipmaric reports cannot be correct, because that's more than the cost 646 of the code in the text!)
I also implemented the Huffman-code algorithm, and also got 649 for the optimal length. My per letter bit lengths are:
{
'n': 3, 'h': 4, 'w': 4, 'x': 5,
'l': 6, 'u': 6, 'o': 4, 'y': 5,
'f': 5, 'v': 5, 'r': 5, 't': 3,
'c': 6, 'a': 6, 'g': 6, 'z': 7,
'd': 7, 'i': 4, 'e': 3, 's': 3
}
My generated tree (in bracketed notation):
(((n,(h,w)),(((x,(l,u)),o),((y,f),(v,r)))),((t,(((c,a),(g,(z,d))),i)),(e,s)))
I think it's almost the same as the picture above, just 'v' and 'y' are replaced.
I used the message string:
'thissentencecontainsthreeasthreecstwodstwentysixesfivefsthreegseighthsthirteenistwolssixteennsnineossixrstwentysevensstwentytwotstwousfivevseightwsfourxsfiveysandonlyonez'
@erbenpeter You have exactly the same code lengths as me, so I must be calculating the encoded message length incorrectly. Working...
Yup, I can't do arithmetic; 649 is correct. So there were two errors here, one in constructing the code and the other in computing its cost. Both fixed now (locally).
Yes, the cost of this tree is 649.
X | L | U | O | F | V | Y | R | N | H | W | T | E | S | I | A | C | G | D | Z | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
freq | 4 | 2 | 2 | 9 | 5 | 5 | 5 | 6 | 16 | 8 | 8 | 22 | 26 | 27 | 13 | 3 | 3 | 3 | 2 | 1 | |
depth | 5 | 6 | 6 | 4 | 5 | 5 | 5 | 5 | 3 | 4 | 4 | 3 | 3 | 3 | 4 | 6 | 6 | 6 | 7 | 7 | Total |
bits | 20 | 12 | 12 | 36 | 25 | 25 | 25 | 30 | 48 | 32 | 32 | 66 | 78 | 81 | 52 | 18 | 18 | 18 | 14 | 7 | 649 |
In the original calculation in the text there was an error. The code for O was 10001, but in the table you counted its depth as 4 instead of 5, so you lost 9 :)
Filip
Chapter number or note title: 4.4. Huffman codes
Page number: 165
Error description: I have implemented Huffman code encoder and it reported that the minimal number of bits is 649 instead of 646. In the tree shown on the figure 17 should be paired with 21 instead of 22 (letter T), and this error propagates further to the top of the tree. Depth of the letter O is 5 instead of 4, so the table also contains an error wrt. the given tree.