Under heavy load we observed several nodes of our application not performing regular tasks. Further debugging revealed that the scheduler thread used application-wide and passed to tech.ydb.core.grpc.GrpcTransport::withSchedulerFactory got stuck with the following stack trace:
Under heavy load we observed several nodes of our application not performing regular tasks. Further debugging revealed that the scheduler thread used application-wide and passed to tech.ydb.core.grpc.GrpcTransport::withSchedulerFactory got stuck with the following stack trace:
The thread was infinitely waiting (actually 3 days before we found the issue) for the channels to shut down at https://github.com/ydb-platform/ydb-java-sdk/blob/4bfda6831fe0fd64c12169aadebbe8f5cd8c6873/core/src/main/java/tech/ydb/core/impl/pool/GrpcChannelPool.java#L73
IMO doing any non-trivial work in the scheduler thread should be avoided.