xiaoyin0208 / lz4

Automatically exported from code.google.com/p/lz4
0 stars 0 forks source link

Buffer size estimation and memory safety #8

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Currently the users have to estimate the worst-case compressed size by 
following the guidance in the comment to LZ4_compress; moreover, it's not 
possible to use a smaller buffer (i.e. we expect the data to be at least 50% 
compressible, so we want to save memory in the common case).

It would be nice to have:
1. A function for converting input size to worst-case output size (i.e. see 
compressBound() in zlib)
2. A function for safe compression with a specified output buffer size that's 
guaranteed to never write past the end; return negative number or zero to 
indicate error
3. (related) A function for safe decompression with a specified input buffer 
size that's guaranteed to never read past the end of input buffer.

Original issue reported on code.google.com by arseny.k...@gmail.com on 11 Feb 2012 at 7:06

GoogleCodeExporter commented 8 years ago
Good points. I will look into these.

Original comment by yann.col...@gmail.com on 11 Feb 2012 at 9:32

GoogleCodeExporter commented 8 years ago
Regarding point 1 :

Please find in attached file a proposed candidate rc55, which features a new 
function, LZ4_compressBound().
LZ4Demo and bench.c have both been modified to use it.

Regarding point 3 :

Not sure if i correctly understood your question, but if i did, maybe the 
function LZ4_uncompress_unknownOutputSize() already answers your need.

Regarding point 2 :

This request is a bit more complex, and will probably require to write a new 
compression function. The new function is likely to be slower as a result of 
extra checks.

Original comment by yann.col...@gmail.com on 13 Feb 2012 at 9:07

Attachments:

GoogleCodeExporter commented 8 years ago
LZ4_compressBound() has been added to r55.

Original comment by yann.col...@gmail.com on 16 Feb 2012 at 7:11

GoogleCodeExporter commented 8 years ago
Thanks!

Regarding point 3 - yes, the functionality is indeed available, I missed it.

Original comment by arseny.k...@gmail.com on 20 Feb 2012 at 6:55

GoogleCodeExporter commented 8 years ago
Good !

Regarding Point 2 :
The issue is complex enough to require to write a new function.
Since it is also within the objective of LZ4 to remain lean and simple, i'm a 
bit puzzled, i want to avoid multiplying the number of variations to maintain.

I'm wondering which kind of usages are being targeted by proposition 2. 
Specifically, i'm wondering if it does belong to a larger family of functions 
which deal with more complex memory allocation strategies, such as limited 
output buffer, compression using continuous small input segments, decoding on 
the flow without reading the entire block, and so on.
In this case, it might require an new source file, dedicated to such advanced 
family of functions. And of course, it is necessary to understand which usage 
it could help.

Original comment by yann.col...@gmail.com on 20 Feb 2012 at 9:02

GoogleCodeExporter commented 8 years ago
My original use case for it was to try to use a small buffer to preserve memory 
(I know the average compression ratio for the application), and as a fallback 
to allocate a full-sized buffer and to run the compression again.

I don't have the need for this currently though, since I now split the data 
into chunks that are small enough to always fit in memory - so if this is 
complicated, it's probably best to wait for other users with the same problem.

Original comment by arseny.k...@gmail.com on 22 Feb 2012 at 6:08

GoogleCodeExporter commented 8 years ago
OK, noted.

Indeed, a new function will be required to answer this need.
As there are already 2 compression functions, it would increase the total 
combination to 4. It starts to become uneasy. Therefore, either the source code 
becomes more difficult to maintain, or it would require a completely different 
layout, which may in turn make in more difficult to debug. Either way, it is a 
pretty major change, so better be careful.

Anyway, i'll keep your suggestion in mind. Maybe some day it will become an 
important feature, or i could find a nice trick to make it painless to add to 
the source.

Regards

Original comment by yann.col...@gmail.com on 24 Feb 2012 at 10:18

GoogleCodeExporter commented 8 years ago
Point 1 : Added LZ4_compressBound() in r55
Point 2 : delayed for future implementation
Point 3 : Answered by LZ4_uncompress_unknownOutputSize()

Original comment by yann.col...@gmail.com on 24 Feb 2012 at 10:19