Closed xakepp35 closed 2 years ago
Here's a function (not very well tested) for computing the decompressed size of a block:
func UncompressedSize(src []byte) (size int64) {
for len(src) > 0 {
b := int64(src[0])
src = src[1:]
lLen := b >> 4
if lLen == 0xF {
for {
if len(src) == 0 {
return -1
}
add := int64(src[0])
lLen += add
if lLen < 0 {
return -1
}
src = src[1:]
if add != 0xFF {
break
}
}
}
size += lLen
switch len(src) {
case 0:
return size
case 1: // No space for a 16-bit offset.
return -1
}
offset := int64(binary.LittleEndian.Uint16(src))
if offset == 0 {
return -1
}
src = src[2:]
mLen := b & 0xF
if mLen == 0xF {
for {
if len(src) == 0 {
return -1
}
add := int64(src[0])
mLen += add
if mLen < 0 {
return -1
}
src = src[1:]
if add != 0xFF {
break
}
}
}
mLen += minMatch
}
return -1
}
I've been thinking of submitting this as a PR, but haven't got round to it yet. In particular, it doesn't validate offsets and doesn't handle preset dictionaries ("linked blocks").
When using the LZ4 block compression, there is no way to easily get the size of the uncompressed data, it is left to the user of the format to handle it the way that suits him/her. The typical way is to prefix the compressed data with the information.
I have attempted to do what @greatroar proposes but it makes decompression pretty much twice as slow.
In general though, it is better to use the LZ4 frame format than the raw block one.
I'm not proposing to do this before decompressing. It might be a useful freestanding function for some applications, but I must admit I wouldn't use it myself.
Please use a custom format if needed. LZ4 will only support the standard LZ4 format as per the reference implementation.
In the example we have following weird code block:
What is
10*len(data)
? Why it is not9,
and not11?
I am tight on memory and will it fail for factor9
? for8
? for7
? Where is the API to determine exact size of data, that is required for decompression?!