Open k-wall opened 2 months ago
Dup of #294.
This just failed again.
Error: Failures:
Error: TemplateTest$Tuples.afterAll:163
expected: [[3, 1], [1, 1], [3, -1]]
but was: [[0, -1], [1, 1], [3, 1]]
[INFO]
admin.describeCluster().nodes().get().size()
is returning zero which seems weird.
I notice looking at org.apache.kafka.image.publisher.ControllerRegistrationsPublisher#describeClusterControllers
that describeClusterControllers
consults controllers map which will be null if an appropriate metadata updates has not arrived yet. Could this be giving the race condition?
@showuon (low priority) does this look like a Kafka defect to you?
Questions:
Questions:
- Is it possible to get logs inside controller/broker nodes?
I'll see if I can get a reproduction with logs.
- Does the admin client connect to the broker or controller?
broker.
- I'd like to know if we re-describe cluster, is the response still the same? I'd guess this is just a temporary state while the nodes are catching up with the metadata logs.
I expect so. I can add a retry loop to show whether that's the case.
This problem is longstanding - so it is not a regression in a newer release.
I've been trying to get a reproduction with separate broker logs. The only time I can actually get it to fail is when the 3 Brokers are co-located with the same. Even then it is really sporadic.
From a CI run, logs attached below. It is not immediately obvious to me how the failure is occuring.
logs_28425902804.zip