Closed Hackzzila closed 4 months ago
We dont use kafka versions that old, so this wont affect us.
But looking at the source you linked makes me think the original implementation is correct?
The definitions of the two algorithms from the crc crate are:
pub const CRC_32_ISO_HDLC: Algorithm<u32> = Algorithm { width: 32, poly: 0x04c11db7, init: 0xffffffff, refin: true, refout: true, xorout: 0xffffffff, check: 0xcbf43926, residue: 0xdebb20e3 };
pub const CRC_32_CKSUM: Algorithm<u32> = Algorithm { width: 32, poly: 0x04c11db7, init: 0x00000000, refin: false, refout: false, xorout: 0xffffffff, check: 0x765e7680, residue: 0xc704dd7b };
The existing one has initial value of 0xffffffff while the new proposed one has 0x0. And in the code you linked it resets the crc value to 0, matching the old implementation: https://github.com/openjdk/jdk/blob/21a6cf848da00c795d833f926f831c7aea05dfa3/src/java.base/share/classes/java/util/zip/CRC32.java#L123
The proposed algorithm could still be right though, maybe the update method includes the setting of this init value, or maybe I dont understand what the init value is used for in the crc crate.
I think we need some kind of test case to verify this is correct. I'll look into making a PR to setup a way to run some integration level tests against kafka.
I'll look into making a PR to setup a way to run some integration level tests against kafka.
Okay, yeah, it would be great to be able to validate this. Thanks for adding the footprint to that. Testing our conformance with Kafka is an important function of these kinds of changes.
@Hackzzila would you mind taking a look at #63 and evaluating adding a test to validate your changes on top of that? If it doesn't break @rukai I'm okay with merging this as-is in a new breaking release but would be great to be able to confirm.
I haven't gone through and looked at the code directly, but according to the docs the java CRC32 implementation is described in RFC 1952 which then specifies:
... CRC-32 algorithm used in the ISO 3309 standard and in section 8.1.1.6.2 of ITU-T recommendation V.42.
I think tests would be a great idea! I can add some to test encode/decode of v1/v2 message batches.
Going to merge with the assumption that #65 will validate.
For v0/v1 message sets, Kafka uses
java.util.zip.CRC32
which isCRC_32_ISO_HDLC
, notCRC_32_CKSUM
.This will successfully decode message sets from Kafka v3.7. However, I have not yet tried to encode message sets and send them to Kafka.