Closed AllenZMC closed 3 years ago
You should check the Cluster Operator log for details. I do not think I saw this particular error before. Could be maybe some RBAC issue?
See this, it is a KubernetesClientException
, Have you encountered this situation?
Status:
Conditions:
Last Transition Time: 2020-11-29T16:00:11+0000
Message: Operation: [get] for kind: [Kafka] with name: [testkafka] in namespace: [openshift-operators] failed.
Reason: KubernetesClientException
Status: True
Type: NotReady
I saw that, but as I said, you have to check the Cluster Operator log for more details.
Cluster Operator log :
2020-12-01 06:04:59 WARN AbstractOperator:377 - Reconciliation #38462(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:06:59 WARN AbstractOperator:377 - Reconciliation #38465(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:08:59 WARN AbstractOperator:377 - Reconciliation #38468(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:10:59 WARN AbstractOperator:377 - Reconciliation #38471(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:12:59 WARN AbstractOperator:377 - Reconciliation #38474(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:14:59 WARN AbstractOperator:377 - Reconciliation #38477(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:16:59 WARN AbstractOperator:377 - Reconciliation #38480(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:18:59 WARN AbstractOperator:377 - Reconciliation #38483(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:20:59 WARN AbstractOperator:377 - Reconciliation #38486(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
2020-12-01 06:22:59 WARN AbstractOperator:377 - Reconciliation #38489(timer) Kafka(openshift-operators/testkafka): Failed to acquire lock lock::openshift-operators::Kafka::testkafka within 10000ms.
Please share the whole log from the beginning (when the container starts) till the end.
Sorry, the whole log is too long, is there any other way?
Restart the operator pod?
protected final <T> Future<T> withLock(Reconciliation reconciliation, long lockTimeoutMs, Callable<Future<T>> callable) {
Promise<T> handler = Promise.promise();
String namespace = reconciliation.namespace();
String name = reconciliation.name();
final String lockName = getLockName(namespace, name);
log.debug("{}: Try to acquire lock {}", reconciliation, lockName);
vertx.sharedData().getLockWithTimeout(lockName, lockTimeoutMs, res -> {
if (res.succeeded()) {
log.debug("{}: Lock {} acquired", reconciliation, lockName);
Lock lock = res.result();
try {
callable.call().onComplete(callableRes -> {
if (callableRes.succeeded()) {
handler.complete(callableRes.result());
} else {
handler.fail(callableRes.cause());
}
lock.release();
log.debug("{}: Lock {} released", reconciliation, lockName);
});
} catch (Throwable ex) {
lock.release();
log.debug("{}: Lock {} released", reconciliation, lockName);
log.error("{}: Reconciliation failed", reconciliation, ex);
handler.fail(ex);
}
} else {
log.warn("{}: Failed to acquire lock {} within {}ms.", reconciliation, lockName, lockTimeoutMs);
handler.fail(new UnableToAcquireLockException());
}
});
return handler.future();
}
You can upload it fir example to pastebin or other services. But at least share as much as possible. 10 lines are not enough, especially when they contain all the same message ;-). (FYI: you can read in the docs what the exact message means: https://strimzi.io/docs/operators/latest/full/using.html#what_do_the_failed_to_acquire_lock_warnings_in_the_log_mean)
Well, the my-cluster
is supposed to be in openshift-marketplace
namespace. Not in the openshift-operators
namespace.
Thanks! I restart the operator pod to solve this problem.
Is there a plan to fix the problem of lock acquisition failure? @scholzj
There is no problem with the lock acquisition. It is expected as described in the FAQ. If you think your case is different from what is described in the FAQ and it never gets released without deleting the pod, we can investigate it. But we need the full log (at DEBUG level) for it.
I will provide a complete log next time I encounter this situation.
See the following output, Why the status of Kafka is NotReady? But the all kafka cluster pods are all running.
on OCP4.x