jet / kafunk

Kafunk: F# Kafka client
https://jet.github.io/kafunk/
Other
160 stars 63 forks source link

LZ4 compression #129

Open eulerfx opened 7 years ago

vchekan commented 6 years ago

You might be interested in my implementation of LZ4 in kafka4net for some hints: https://github.com/vchekan/kafka4net/blob/master/src/Compression/Lz4KafkaStream.cs Things, like bug in kafka checksum implementation can cause a lot of time to debug. https://issues.apache.org/jira/browse/KAFKA-3160

Another advice, you might want to invest into java cross-validation framework, like this: https://github.com/vchekan/kafka4net/blob/master/tools/binary-console/src/main/scala/com/ntent/kafka/main.scala where I generate kafka messages using java driver and use it as golden standard with different types of compressions and buffer sizes. Additional bonus, I get confidence that my implementation works with java consumer.

eulerfx commented 6 years ago

Hey, thanks for the pointer. I wasn't aware of the LZ4 library and was considering implementing the compression from scratch. Has your experience with the library been good?

Regarding the cross-validation framework - good call, I think this would be useful.

vchekan commented 6 years ago

I remember we have used compression for many years in production but do not recall, which one it was, snappy or lz4. Compression + my frames implementation I test here (every buffer size from 1 byte to 256Kb, random content): https://github.com/vchekan/kafka4net/blob/master/tests/CompressionTests.cs

Here I run java compatibility test for gzip, lz4, snappy codecs. Idea is to invoke java and generate random content messages. C# creates text file with desired message sizes, java generates messages of desired length, publish messages to kafka and writes text file with hash codes of generated messages. C# reads java's hashes, consumes messages and compares message hash to the one generated by java. https://github.com/vchekan/kafka4net/blob/master/tests/RecoveryTest.cs#L1967

eulerfx commented 6 years ago

Looks like there is a lot of interest in getting LZ4 for Kafunk at Jet now. Should be prioritizing this work soon.