mw99 / DataCompression

Swift libcompression wrapper as an extension for the Data type (GZIP, ZLIB, LZFSE, LZMA, LZ4, deflate, RFC-1950, RFC-1951, RFC-1952)
Apache License 2.0
286 stars 57 forks source link

Problem using with Python lz4framed library #5

Closed bvelasquez closed 6 years ago

bvelasquez commented 6 years ago

After compressing with:

let json = try? JSONSerialization.data(withJSONObject: sendData) deflated = json?.compress(withAlgorithm: Data.CompressionAlgorithm.LZ4)

Error: _lz4framed.Lz4FramedError, ('ERROR_frameType_unknown', 13)

inflated = lz4framed.decompress(data)

Perhaps different lz4 versions?

Python 3.6 lzframed: https://github.com/Iotic-Labs/py-lz4framed

bvelasquez commented 6 years ago

I also tried the standard Python lz4 library:

LZ4F_getFrameInfo failed with code: ERROR_frameType_unknown

https://pypi.python.org/pypi/lz4

Similar frame type error.

mw99 commented 6 years ago

That looks strange, would at least expect that the official python lz4 implementation works. Thanks for testing that as well. I will take a closer look when I have time.

bvelasquez commented 6 years ago

@mw99 I noticed this in the Apple Documentation:

"The encoded format we produce and consume is compatible with the open source version, except that we add a very simple frame to the raw stream to allow some additional validation and functionality."

Not sure if that "very simple frame" is what the problem happens to be.

Thanks for looking into this when you have time.

Link: https://developer.apple.com/documentation/compression/data_compression

mw99 commented 6 years ago

Yes that sounds very suspicious. I will take a closer look when I have more time. Thanks

bvelasquez commented 6 years ago

@mw99 Did you get a chance to look into this?

mw99 commented 6 years ago

So today I took a closer look and it's very strange...

Apple does put a header and a footer around the stream. The footer is always 0x62763424 (4 bytes). The header has two variations.

  1. VARIANT 1 0x62763431 + uint32 + uint32 (12 bytes)
  2. VARIANT 2 0x6276342d (4 bytes)

VARIANT 1 is followed by what seems to be a header. But it is not a LZ4 header... great. VARIANT 2 means no compression and then the data follows in uncompressed plaintext. Apple picks one variant at random, and I have no idea why they would do that. Makes no sense.

VARIANT 1 looks like this if you compress some ascii zeros.

00000000  1f 30 01 00 ff d3 f0 5e  30 30 30 30 30 30 30 30  |.0.....^00000000|
00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30  |0000000000000000|

None of the lz4 decoders I tried recognizes 0x1f300100. (Python lz4, lz4 reference implementation) By looking at the data of VARIANT 1, even very simple data does not get compressed. To be honest I would recommend to not use LZ4 at all because apples implementations seems to be crap.