grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.48k stars 3.85k forks source link

since grpc-java 1.68.1, pekko-grpc has issues with RetryingNameResolver (throwIfNotInThisSynchronizationContext) #11662

Closed pjfanning closed 1 week ago

pjfanning commented 2 weeks ago

Similar to #10407 but has started affecting us with grpc-java 1.68.1

Error:  [11/03/2024 08:12:31.956] [default-pekko.actor.default-dispatcher-6] [org.apache.pekko.dispatch.Dispatcher] Not called from the SynchronizationContext
java.lang.IllegalStateException: Not called from the SynchronizationContext
    at com.google.common.base.Preconditions.checkState(Preconditions.java:515)
    at io.grpc.SynchronizationContext.throwIfNotInThisSynchronizationContext(SynchronizationContext.java:134)
    at io.grpc.internal.ManagedChannelImpl$NameResolverListener.onResult2(ManagedChannelImpl.java:1686)
    at io.grpc.internal.RetryingNameResolver$RetryingListener.onResult2(RetryingNameResolver.java:107)
    at io.grpc.NameResolver$Listener2.onAddresses(NameResolver.java:228)
    at org.apache.pekko.grpc.internal.PekkoDiscoveryNameResolver.$anonfun$lookup$1(PekkoDiscoveryNameResolver.scala:56)
    at org.apache.pekko.grpc.internal.PekkoDiscoveryNameResolver.$anonfun$lookup$1$adapted(PekkoDiscoveryNameResolver.scala:53)

pekko-grpc PR: https://github.com/apache/pekko-grpc/pull/397

pekko-grpc is written in Scala and we are using Scala Futures when doing lookups asynchronously. grpc-java seems now to require that we use your SynchronizationContext instead.

I added an experimental change to pekko-grpc name resolution to add blocking code. This allowed me to avoid this issue but I discovered that we have some tests that still fail because io.grpc is unhappy that we are using Scala Futures. I wouldn't be delighted about the hack in our name resolver either.

ejona86 commented 2 weeks ago

@kannanjgithub, the old Listener didn't require using the synchronization context. But the new stuff does. We probably messed up one of the adapting methods with onResult2 and didn't take that into account.

@pjfanning, NameResolvers can access the SynchronizationContext from [NameResolver.Args.getSynchronizationContext()](https://grpc.github.io/grpc-java/javadoc/io/grpc/NameResolver.Args.html#getSynchronizationContext()). Long-term (getting nearer term), we will require NameResolvers to call the Listener from the SynchronizationContext, so that is a good change to make if you are willing.

pjfanning commented 2 weeks ago

@ejona86 Thanks for your quick response.

Apache Pekko is quite modular and pekko-grpc relies on other pekko modules including a pekko-discovery module. pekko-discovery has async APIs using Scala Futures. We won't be able to change this. The custom NameResolver in pekko-grpc uses the pekko-discovery APIs. In the short term, we will pin to grpc-java 1.67. In the long term, we will need to watch and see how grpc-java continues to evolve and whether we need to substantially rewrite pekko-grpc to uptake newer versions of grpc-java.

kannanjgithub commented 2 weeks ago

The problem was with the gRPC code NameResolver.Listener.onAddresses abstract base class implementation calling onResult2 outside of the synchronization context. In the first part of the changes for onResult2 it was calling onResult like it should but the later PR for ResolutionResult introduced this regression. This (and 2 other such callers) went uncaught because the check for call from the synchronization context is in ManagedChannelImpl's implementation of Listener2 but respective unit tests of the name resolvers use their own listeners. I have raised PR #11666 with the fix.

ejona86 commented 2 weeks ago

whether we need to substantially rewrite pekko-grpc to uptake newer versions of grpc-java.

@pjfanning, I don't see what substantial changes you are talking about. Somewhere between onComplete and calling the listener call syncContext.execute and run the remaining code inside. I don't know Scala, but it would be a one line change in Java.

hacetin commented 2 weeks ago

We also encountered this issue when upgrading grpc-netty to 1.68.1 from 1.68.0. The issue was in one of our grpc stream services. It started throwing i.g.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions will be exceeded in 59.999822705s. and Not called from the SynchronizationContext exceptions.

Our system is in Scala 2.13. Reverting back to grpc-netty 1.68.0 fixed the problem.

pjfanning commented 2 weeks ago

@hacetin 1.68.0 has been marked as a mistaken release. I am switching to v1.67.1 in pekko-grpc.

ejona86 commented 1 week ago

Fixed by #11666