Open dpcollins-google opened 2 years ago
Is this a regression, only manifested in new library or dependency versions? And although it sounds like this is not something easily reproducible, any small sample or snippet that demonstrates this?
This is unclear, but I had not experienced this in the past, so it is likely a recent (O(months) though) regression.
An example code snippet which triggered this from the apache beam repo is:
CursorServiceClient newCursorServiceClient() { ... }
newCursorServiceClient()
.commitCursor(
CommitCursorRequest.newBuilder()
.setSubscription(options.subscriptionPath().toString())
.setPartition(partition.value())
.setCursor(Cursor.newBuilder().setOffset(offset.value()))
.build());
I see the transport of pubsublite v1 is gRPC. @vam-google any thoughts?
@chanseokoh There are no other clients besides java-compute depending on rest transport right now. So it is safe to ssume that all reported issues, if they are not compute related are gRPC.
I just created my own pipeline- I'm able to recreate this fairly frequently, where the future takes over a minute to finish. It has the following (truncated) stacktrace:
java.util.concurrent.TimeoutException: Waited 1 minutes (plus 834188 nanoseconds delay) for com.google.api.gax.retrying.CallbackChainRetryingFuture@32bd4ca3[status=PENDING]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:527)
at org.apache.beam.sdk.io.gcp.pubsublite.internal.SubscriberAssembler.lambda$getCommitter$0(SubscriberAssembler.java:106)
at org.apache.beam.sdk.io.gcp.pubsublite.internal.PerSubscriptionPartitionSdf.lambda$processElement$0(PerSubscriptionPartitionSdf.java:88)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at org.apache.beam.sdk.io.gcp.pubsublite.internal.PerSubscriptionPartitionSdf.processElement(PerSubscriptionPartitionSdf.java:84)
at org.apache.beam.sdk.io.gcp.pubsublite.internal.PerSubscriptionPartitionSdf$DoFnInvoker.invokeProcessElement(Unknown Source)
...
P1 out of SLO, please take a look & triage
To provide more information, it appears that in this case the issue is with executor exhaustion at the GRPC layer preventing the grpc future from ever returning. However, it would be useful to enforce deadlines on the gax future (i.e. complete it early) even if GRPC never completes the request.
@dpcollins-google Is there a corresponding issue filed against gRPC? Also, can we change this to a feature request and downgrade the priority? Thanks!
I checked with @dpcollins-google offline and we agreed to change this to a feature request and downgrade to p2.
Environment details
Steps to reproduce
get()