kroxylicious / kroxylicious-junit5-extension

JUnit5 extension and helpers for writing tests parameterised over Kafka clusters
Apache License 2.0
9 stars 10 forks source link

TemplateTest#testTuples flakey #36

Closed k-wall closed 1 year ago

k-wall commented 1 year ago
[ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 17.38 s <<< FAILURE! - in io.kroxylicious.testing.kafka.junit5ext.TemplateTest
[ERROR] io.kroxylicious.testing.kafka.junit5ext.TemplateTest  Time elapsed: 17.38 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <[[1, 1], [3, -1], [3, 1]]> but was: <[[1, 1], [3, -1]]>
    at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
k-wall commented 1 year ago

I wonder if it has to do with the order listeners are coming up. Currently the code uses TreeSets, keyed by listener name so ANON is first in the listener list. The Kafka code kicks off the listeners in order, but I think the code is async, so which listener comes ready first is non-deterministic. Making the ANON last in the listener seems to help a bit, but I still sporadic fails.

I don't see an easier way to fix this. Maybe switching to probing the EXTERNAL listener (and dealing with the SASL auth) is the best approach.

k-wall commented 1 year ago

No, it isn't listener order causing the problem. Even if I probe EXTERNAL, I still see the issue.

2023-03-01 14:05:58 WARN  io.kroxylicious.testing.kafka.invm.InVMKafkaCluster:158 - Generated config {node.id=2, offsets.topic.num.partitions=1, offsets.topic.replication.factor=1, listener.security.protocol.map=ANON:PLAINTEXT,CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,INTERNAL:PLAINTEXT, listeners=ANON://0.0.0.0:61626,EXTERNAL://0.0.0.0:61623,INTERNAL://0.0.0.0:61629, group.initial.rebalance.delay.ms=0, inter.broker.listener.name=INTERNAL, log.dir=/var/folders/5s/v5576rp175bf4vrf15mgvffm0000gn/T/kafka7019848818931926552/broker-2, controller.quorum.voters=0@//localhost:61630, controller.listener.names=CONTROLLER, process.roles=broker, broker.id=2, advertised.listeners=ANON://localhost:61626,EXTERNAL://localhost:61623,INTERNAL://localhost:61629, early.start.listeners=ANON,EXTERNAL,INTERNAL}
Formatting /var/folders/5s/v5576rp175bf4vrf15mgvffm0000gn/T/kafka7019848818931926552/broker-2 with metadata.version 3.3-IV0.
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:53 - describing cluster using address: localhost:61623
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:57 - localhost:61623 sees peers: [localhost:61621 (id: 0 rack: null), localhost:61622 (id: 1 rack: null), localhost:61623 (id: 2 rack: null)]
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:53 - describing cluster using address: localhost:61622
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:57 - localhost:61622 sees peers: [localhost:61621 (id: 0 rack: null), localhost:61622 (id: 1 rack: null), localhost:61623 (id: 2 rack: null)]
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:53 - describing cluster using address: localhost:61621
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:57 - localhost:61621 sees peers: [localhost:61621 (id: 0 rack: null), localhost:61622 (id: 1 rack: null), localhost:61623 (id: 2 rack: null)]
2023-03-01 14:05:59 WARN  io.kroxylicious.testing.kafka.junit5ext.KafkaClusterExtension:442 - KWDEBUG test [engine:junit-jupiter]/[class:io.kroxylicious.testing.kafka.junit5ext.TemplateTest]/[test-template:testTuples(io.kroxylicious.testing.kafka.api.KafkaCluster, org.apache.kafka.clients.admin.Admin)]/[test-template-invocation:#1]: Resolving parameter (Admin admin)
2023-03-01 14:05:59 WARN  io.kroxylicious.testing.kafka.junit5ext.KafkaClusterExtension:820 - KWDEBUG test [engine:junit-jupiter]/[class:io.kroxylicious.testing.kafka.junit5ext.TemplateTest]/[test-template:testTuples(io.kroxylicious.testing.kafka.api.KafkaCluster, org.apache.kafka.clients.admin.Admin)]/[test-template-invocation:#1]: decl org.apache.kafka.clients.admin.Admin admin: Creating Admin
2023-03-01 14:05:59 WARN  io.kroxylicious.testing.kafka.junit5ext.TemplateTest:133 - KWDEBUG test says: [localhost:61621 (id: 0 rack: null), localhost:61623 (id: 2 rack: null)]
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.581 s <<< FAILURE! - in io.kroxylicious.testing.kafka.junit5ext.TemplateTest
[ERROR] io.kroxylicious.testing.kafka.junit5ext.TemplateTest  Time elapsed: 2.581 s  <<< FAILURE!
k-wall commented 1 year ago

If I run #testTuples but change to the parameter type to TestcontainersKafkaCluster , the test never fail. I've run the tests in a cycle more than 70 times and still don't see a failure. Could the failure be a kafka defect brought out by co-location of three kafkas in the same VM?

k-wall commented 1 year ago

By selectively turning up logging, I can see the tests's DescribeClusterResponse is indeed incomplete, even though the awaitExpectedBrokerCountInCluster has just seen a DescribeClusterResponse with all brokers.

k-wall commented 1 year ago

I haven't been able to make much progress isolating the cause of the sporadic test failure. It feels like a race condition, perhaps broker side. As soon as I turn up broker side logging, I am no longer able to reproduce the issue.

SamBarker commented 1 year ago

I'm hoping https://github.com/kroxylicious/kroxylicious-junit5-extension/pull/94 does something to address this as well.

k-wall commented 1 year ago

I'm still seeing this pesky one fail from time to time, even with the changes.

SamBarker commented 1 year ago

Yeah, me too. The port clash doesn't seem to be transitory as expected.

SamBarker commented 1 year ago

I don't remember the last time I saw this fail. So I'm closing please re-open if it happens again