tychedelia / kafka-protocol-rs

Rust Kafka protocol
Apache License 2.0
63 stars 23 forks source link

use CRC-32 ISO/HDLC instead of CRC-32 CKSUM #62

Closed Hackzzila closed 4 months ago

Hackzzila commented 4 months ago

For v0/v1 message sets, Kafka uses java.util.zip.CRC32 which is CRC_32_ISO_HDLC, not CRC_32_CKSUM.

This will successfully decode message sets from Kafka v3.7. However, I have not yet tried to encode message sets and send them to Kafka.

rukai commented 4 months ago

We dont use kafka versions that old, so this wont affect us.

But looking at the source you linked makes me think the original implementation is correct?

The definitions of the two algorithms from the crc crate are:

pub const CRC_32_ISO_HDLC: Algorithm<u32> = Algorithm { width: 32, poly: 0x04c11db7, init: 0xffffffff, refin: true, refout: true, xorout: 0xffffffff, check: 0xcbf43926, residue: 0xdebb20e3 };
pub const CRC_32_CKSUM:    Algorithm<u32> = Algorithm { width: 32, poly: 0x04c11db7, init: 0x00000000, refin: false, refout: false, xorout: 0xffffffff, check: 0x765e7680, residue: 0xc704dd7b };

The existing one has initial value of 0xffffffff while the new proposed one has 0x0. And in the code you linked it resets the crc value to 0, matching the old implementation: https://github.com/openjdk/jdk/blob/21a6cf848da00c795d833f926f831c7aea05dfa3/src/java.base/share/classes/java/util/zip/CRC32.java#L123

The proposed algorithm could still be right though, maybe the update method includes the setting of this init value, or maybe I dont understand what the init value is used for in the crc crate.

I think we need some kind of test case to verify this is correct. I'll look into making a PR to setup a way to run some integration level tests against kafka.

tychedelia commented 4 months ago

I'll look into making a PR to setup a way to run some integration level tests against kafka.

Okay, yeah, it would be great to be able to validate this. Thanks for adding the footprint to that. Testing our conformance with Kafka is an important function of these kinds of changes.

@Hackzzila would you mind taking a look at #63 and evaluating adding a test to validate your changes on top of that? If it doesn't break @rukai I'm okay with merging this as-is in a new breaking release but would be great to be able to confirm.

Hackzzila commented 4 months ago

I haven't gone through and looked at the code directly, but according to the docs the java CRC32 implementation is described in RFC 1952 which then specifies:

... CRC-32 algorithm used in the ISO 3309 standard and in section 8.1.1.6.2 of ITU-T recommendation V.42.

Which should be ISO/HDLC.

I think tests would be a great idea! I can add some to test encode/decode of v1/v2 message batches.

tychedelia commented 4 months ago

Going to merge with the assumption that #65 will validate.