Closed k-wall closed 1 year ago
I wonder if it has to do with the order listeners are coming up. Currently the code uses TreeSets, keyed by listener name so ANON is first in the listener list. The Kafka code kicks off the listeners in order, but I think the code is async, so which listener comes ready first is non-deterministic. Making the ANON last in the listener
seems to help a bit, but I still sporadic fails.
I don't see an easier way to fix this. Maybe switching to probing the EXTERNAL listener (and dealing with the SASL auth) is the best approach.
No, it isn't listener order causing the problem. Even if I probe EXTERNAL, I still see the issue.
2023-03-01 14:05:58 WARN io.kroxylicious.testing.kafka.invm.InVMKafkaCluster:158 - Generated config {node.id=2, offsets.topic.num.partitions=1, offsets.topic.replication.factor=1, listener.security.protocol.map=ANON:PLAINTEXT,CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,INTERNAL:PLAINTEXT, listeners=ANON://0.0.0.0:61626,EXTERNAL://0.0.0.0:61623,INTERNAL://0.0.0.0:61629, group.initial.rebalance.delay.ms=0, inter.broker.listener.name=INTERNAL, log.dir=/var/folders/5s/v5576rp175bf4vrf15mgvffm0000gn/T/kafka7019848818931926552/broker-2, controller.quorum.voters=0@//localhost:61630, controller.listener.names=CONTROLLER, process.roles=broker, broker.id=2, advertised.listeners=ANON://localhost:61626,EXTERNAL://localhost:61623,INTERNAL://localhost:61629, early.start.listeners=ANON,EXTERNAL,INTERNAL}
Formatting /var/folders/5s/v5576rp175bf4vrf15mgvffm0000gn/T/kafka7019848818931926552/broker-2 with metadata.version 3.3-IV0.
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:53 - describing cluster using address: localhost:61623
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:57 - localhost:61623 sees peers: [localhost:61621 (id: 0 rack: null), localhost:61622 (id: 1 rack: null), localhost:61623 (id: 2 rack: null)]
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:53 - describing cluster using address: localhost:61622
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:57 - localhost:61622 sees peers: [localhost:61621 (id: 0 rack: null), localhost:61622 (id: 1 rack: null), localhost:61623 (id: 2 rack: null)]
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:53 - describing cluster using address: localhost:61621
2023-03-01 14:05:59 DEBUG io.kroxylicious.testing.kafka.common.Utils:57 - localhost:61621 sees peers: [localhost:61621 (id: 0 rack: null), localhost:61622 (id: 1 rack: null), localhost:61623 (id: 2 rack: null)]
2023-03-01 14:05:59 WARN io.kroxylicious.testing.kafka.junit5ext.KafkaClusterExtension:442 - KWDEBUG test [engine:junit-jupiter]/[class:io.kroxylicious.testing.kafka.junit5ext.TemplateTest]/[test-template:testTuples(io.kroxylicious.testing.kafka.api.KafkaCluster, org.apache.kafka.clients.admin.Admin)]/[test-template-invocation:#1]: Resolving parameter (Admin admin)
2023-03-01 14:05:59 WARN io.kroxylicious.testing.kafka.junit5ext.KafkaClusterExtension:820 - KWDEBUG test [engine:junit-jupiter]/[class:io.kroxylicious.testing.kafka.junit5ext.TemplateTest]/[test-template:testTuples(io.kroxylicious.testing.kafka.api.KafkaCluster, org.apache.kafka.clients.admin.Admin)]/[test-template-invocation:#1]: decl org.apache.kafka.clients.admin.Admin admin: Creating Admin
2023-03-01 14:05:59 WARN io.kroxylicious.testing.kafka.junit5ext.TemplateTest:133 - KWDEBUG test says: [localhost:61621 (id: 0 rack: null), localhost:61623 (id: 2 rack: null)]
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.581 s <<< FAILURE! - in io.kroxylicious.testing.kafka.junit5ext.TemplateTest
[ERROR] io.kroxylicious.testing.kafka.junit5ext.TemplateTest Time elapsed: 2.581 s <<< FAILURE!
If I run #testTuples but change to the parameter type to TestcontainersKafkaCluster
, the test never fail. I've run the tests in a cycle more than 70 times and still don't see a failure. Could the failure be a kafka defect brought out by co-location of three kafkas in the same VM?
By selectively turning up logging, I can see the tests's DescribeClusterResponse is indeed incomplete, even though the awaitExpectedBrokerCountInCluster
has just seen a DescribeClusterResponse with all brokers.
I haven't been able to make much progress isolating the cause of the sporadic test failure. It feels like a race condition, perhaps broker side. As soon as I turn up broker side logging, I am no longer able to reproduce the issue.
I'm hoping https://github.com/kroxylicious/kroxylicious-junit5-extension/pull/94 does something to address this as well.
I'm still seeing this pesky one fail from time to time, even with the changes.
Yeah, me too. The port clash doesn't seem to be transitory as expected.
I don't remember the last time I saw this fail. So I'm closing please re-open if it happens again