hankya / snappy-go

Automatically exported from code.google.com/p/snappy-go
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Encode appears to allocate ~10K more than it needs on a 64K block #5

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm sorry if this is flat-out wrong or just too unimportant to fix; I'm new 
here.

Encode allocates MaxEncodedLen(len(src)) bytes for dst. However, the (very 
well-explained) scenario motivating the formula in MaxEncodedLen(srcLen) 
involves an encoder emitting 5-byte-long copy elements each copying four bytes. 
But snappy-go can't encounter that scenario because it emits at most 
3-byte-long copy elements, each copying at least four bytes, regardless of 
block length. 

Given the restrictions on what snappy-go emits, the worst case I can see is a 
3-byte-long copy element that copies four bytes, followed by a 2-byte-long 
literal element with length 61, followed by the literal bytes, repeated for the 
whole block: then it takes 66 bytes to encode every 65 bytes. Add three bytes 
for the length, and 64K encodes to 65536*(66/65)+3=66547 bytes, 10393 fewer 
than the 76940 bytes actually allocated. Maybe someone can find a worse case.

If snappy-go did, in the future, emit 5-byte copy elements, presumably it still 
wouldn't do so in a 64K block, because offsets within a 64K block always fit in 
two bytes. 

(If MaxEncodedLen is meant to represent the worst possible blowup with the 
Snappy encoding, rather than the worst possible blowup with snappy-go or other 
currently-existing encoders, note that the encoding appears to allow 
5-byte-long copy elements to only copy one byte, so the absolute most 
pessimistic upper bound may be len(src)*5 plus space for the preamble.)

Original issue reported on code.google.com by twotwo...@gmail.com on 6 Aug 2014 at 6:18