kroxylicious / kroxylicious-junit5-extension

JUnit5 extension and helpers for writing tests parameterised over Kafka clusters
Apache License 2.0
9 stars 10 forks source link

TemplateTest$Tuples fails sporadically #391

Open k-wall opened 2 months ago

k-wall commented 2 months ago

From a CI run, logs attached below. It is not immediately obvious to me how the failure is occuring.

Error:  Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.952 s <<< FAILURE! -- in io.kroxylicious.testing.kafka.junit5ext.TemplateTest$Tuples
Error:  io.kroxylicious.testing.kafka.junit5ext.TemplateTest$Tuples -- Time elapsed: 7.952 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 

expected: [[1, 1], [3, 1], [3, -1]]
 but was: [[0, -1], [1, 1], [3, 1]]
    at io.kroxylicious.testing.kafka.junit5ext.TemplateTest$Tuples.afterAll(TemplateTest.java:163)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1092)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

logs_28425902804.zip

k-wall commented 6 days ago

Dup of #294.

k-wall commented 6 days ago

This just failed again.

Error:  Failures: 
Error:    TemplateTest$Tuples.afterAll:163 
expected: [[3, 1], [1, 1], [3, -1]]
 but was: [[0, -1], [1, 1], [3, 1]]
[INFO] 

admin.describeCluster().nodes().get().size() is returning zero which seems weird.

I notice looking at org.apache.kafka.image.publisher.ControllerRegistrationsPublisher#describeClusterControllers that describeClusterControllers consults controllers map which will be null if an appropriate metadata updates has not arrived yet. Could this be giving the race condition?

@showuon (low priority) does this look like a Kafka defect to you?

showuon commented 6 days ago

Questions:

  1. Is it possible to get logs inside controller/broker nodes?
  2. Does the admin client connect to the broker or controller?
  3. I'd like to know if we re-describe cluster, is the response still the same? I'd guess this is just a temporary state while the nodes are catching up with the metadata logs.
k-wall commented 6 days ago

Questions:

  1. Is it possible to get logs inside controller/broker nodes?

I'll see if I can get a reproduction with logs.

  1. Does the admin client connect to the broker or controller?

broker.

  1. I'd like to know if we re-describe cluster, is the response still the same? I'd guess this is just a temporary state while the nodes are catching up with the metadata logs.

I expect so. I can add a retry loop to show whether that's the case.

This problem is longstanding - so it is not a regression in a newer release.

k-wall commented 4 days ago

I've been trying to get a reproduction with separate broker logs. The only time I can actually get it to fail is when the 3 Brokers are co-located with the same. Even then it is really sporadic.