jeffgerickson / algorithms

Bug-tracking for Jeff's algorithms book, notes, etc.
7.92k stars 1.02k forks source link

Possible error in the Huffman code example #86

Closed filipmaric closed 5 years ago

filipmaric commented 5 years ago

Chapter number or note title: 4.4. Huffman codes

Page number: 165

Error description: I have implemented Huffman code encoder and it reported that the minimal number of bits is 649 instead of 646. In the tree shown on the figure 17 should be paired with 21 instead of 22 (letter T), and this error propagates further to the top of the tree. Depth of the letter O is 5 instead of 4, so the table also contains an error wrt. the given tree.

jeffgerickson commented 5 years ago

Need to double-check myself

julianoch commented 5 years ago

Seconded! My intuition behind this is that the numbers at any level must be larger or equal to those at the next level. This can be proved using an exchange argument. For example if we exchange nodes T with the node containing the sum 16, we are sure to get a full binary tree with a better overall cost.

jeffgerickson commented 5 years ago

Double-checked. @filipmaric is correct that 17 should be paired with 21, not 22. The correct number of bits for the message is 643.

04-greedy pdf

(The minimum cost 649 that @filipmaric reports cannot be correct, because that's more than the cost 646 of the code in the text!)

erbenpeter commented 5 years ago

I also implemented the Huffman-code algorithm, and also got 649 for the optimal length. My per letter bit lengths are:

{
  'n': 3, 'h': 4, 'w': 4, 'x': 5, 
  'l': 6, 'u': 6, 'o': 4, 'y': 5, 
  'f': 5, 'v': 5, 'r': 5, 't': 3, 
  'c': 6, 'a': 6, 'g': 6, 'z': 7, 
  'd': 7, 'i': 4, 'e': 3, 's': 3
}

My generated tree (in bracketed notation):

(((n,(h,w)),(((x,(l,u)),o),((y,f),(v,r)))),((t,(((c,a),(g,(z,d))),i)),(e,s)))

I think it's almost the same as the picture above, just 'v' and 'y' are replaced.

I used the message string:

'thissentencecontainsthreeasthreecstwodstwentysixesfivefsthreegseighthsthirteenistwolssixteennsnineossixrstwentysevensstwentytwotstwousfivevseightwsfourxsfiveysandonlyonez'
jeffgerickson commented 5 years ago

@erbenpeter You have exactly the same code lengths as me, so I must be calculating the encoded message length incorrectly. Working...

jeffgerickson commented 5 years ago

Yup, I can't do arithmetic; 649 is correct. So there were two errors here, one in constructing the code and the other in computing its cost. Both fixed now (locally).

filipmaric commented 5 years ago

Yes, the cost of this tree is 649.

  X L U O F V Y R N H W T E S I A C G D Z  
freq 4 2 2 9 5 5 5 6 16 8 8 22 26 27 13 3 3 3 2 1  
depth 5 6 6 4 5 5 5 5 3 4 4 3 3 3 4 6 6 6 7 7 Total
bits 20 12 12 36 25 25 25 30 48 32 32 66 78 81 52 18 18 18 14 7 649

In the original calculation in the text there was an error. The code for O was 10001, but in the table you counted its depth as 4 instead of 5, so you lost 9 :)

Filip