Open eulerfx opened 7 years ago
Hey, thanks for the pointer. I wasn't aware of the LZ4 library and was considering implementing the compression from scratch. Has your experience with the library been good?
Regarding the cross-validation framework - good call, I think this would be useful.
I remember we have used compression for many years in production but do not recall, which one it was, snappy or lz4. Compression + my frames implementation I test here (every buffer size from 1 byte to 256Kb, random content): https://github.com/vchekan/kafka4net/blob/master/tests/CompressionTests.cs
Here I run java compatibility test for gzip, lz4, snappy codecs. Idea is to invoke java and generate random content messages. C# creates text file with desired message sizes, java generates messages of desired length, publish messages to kafka and writes text file with hash codes of generated messages. C# reads java's hashes, consumes messages and compares message hash to the one generated by java. https://github.com/vchekan/kafka4net/blob/master/tests/RecoveryTest.cs#L1967
Looks like there is a lot of interest in getting LZ4 for Kafunk at Jet now. Should be prioritizing this work soon.
You might be interested in my implementation of LZ4 in kafka4net for some hints: https://github.com/vchekan/kafka4net/blob/master/src/Compression/Lz4KafkaStream.cs Things, like bug in kafka checksum implementation can cause a lot of time to debug. https://issues.apache.org/jira/browse/KAFKA-3160
Another advice, you might want to invest into java cross-validation framework, like this: https://github.com/vchekan/kafka4net/blob/master/tools/binary-console/src/main/scala/com/ntent/kafka/main.scala where I generate kafka messages using java driver and use it as golden standard with different types of compressions and buffer sizes. Additional bonus, I get confidence that my implementation works with java consumer.