kafkaex / kafka_ex

Kafka client library for Elixir
MIT License
596 stars 162 forks source link

Change default KafkaEx partitioner to correctly match the Java client #399

Closed mjparrott closed 4 years ago

mjparrott commented 4 years ago

One way of fixing #398

dantswain commented 4 years ago

This makes sense to me. @barthez did the original murmur implementation - do you have any sense if the original may have just been a typo?

I am a little concerned about breaking implementation for anyone who is already using the existing algorithm. Maybe we can provide a legacy partitioner?

If nothing else @mjparrott you'll need to update the test cases, which should be relatively straightforward.

kennethito commented 4 years ago

I think providing a legacy partitioner and some breaking change documentation is a great idea.

joshuawscott commented 4 years ago

I thought these tests had been copied from the Java tests, but it doesn't look like that's actually the case. Perhaps we should remove these test cases, and use the same ones that the Java implementation uses: https://github.com/apache/kafka/blob/8ab0994919752cd4870e771221ba934a6a539a67/clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java#L66-L78

    @Test
    public void testMurmur2() {
        Map<byte[], Integer> cases = new java.util.HashMap<>();
        cases.put("21".getBytes(), -973932308);
        cases.put("foobar".getBytes(), -790332482);
        cases.put("a-little-bit-long-string".getBytes(), -985981536);
        cases.put("a-little-bit-longer-string".getBytes(), -1486304829);
        cases.put("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8".getBytes(), -58897971);
        cases.put(new byte[]{'a', 'b', 'c'}, 479470107);

        for (Map.Entry<byte[], Integer> c : cases.entrySet()) {
            assertEquals(c.getValue().intValue(), murmur2(c.getKey()));
        }
    }
jbruggem commented 4 years ago

I am a little concerned about breaking implementation for anyone who is already using the existing algorithm. Maybe we can provide a legacy partitioner?

Indeed, it's touchy. With kayrock we'll have a major release, so having a breaking change is OK, but we need:

joshuawscott commented 4 years ago

Breaking changes are allowed in semver before 1.0, so I have no issue with breaking this. This is a bug in any case, the intention and documentation is that the partitioning matches the java client, so if it isn't doing that now, it's broken, and changing it to match is fine, IMO

mjparrott commented 4 years ago

I added a LegacyPartitioner module which can be used in place of the DefaultPartitioner which implements the old behaviour. This modules are mostly just a copy-paste of each other. The only difference is the umurmur2 function which is called.

It looks like the one of the builds is failing due to the code copy-paste - any thoughts?

joshuawscott commented 4 years ago

I'm ok if you put in a comment to disable that check in credo for that line or function http://rrrene.org/2017/06/01/credo-config-comments/