Open jnehlmeier opened 2 years ago
There might be an existing issue but I could not find it. @ejona86 probably has more details.
There may be another one, but #4893 is the same.
Thanks @ejona86 . I see similarities with #8309. The comment https://github.com/grpc/grpc-java/issues/8309#issuecomment-881654181 might apply here i.e. "I suspect that if you add a Thread.sleep(1100) after the await termination the error will go away"
Thanks for your responses. As additional information I can tell you that I have upgraded grpc-java from version 1.15.0 to 1.46.0 and only after that update this issue appeared. Also while using 1.15.0 the code wasn't fully correct because it did never call awaitTermination
. So during upgrade to 1.46.0 I also added the awaitTermination
call. I know that 1.15.0 is 4 years old and a lot has probably changed since then. Just wanted to add that information for completeness and to let you know that you did not had that problem in the past (even without calling awaitTermination
)
The comment https://github.com/grpc/grpc-java/issues/8309#issuecomment-881654181 might apply here i.e. "I suspect that if you add a Thread.sleep(1100) after the await termination the error will go away"
Honestly I would expect grcp-java to implement Thread.sleep()
inside awaitTermination
. I know it would feel hack'ish to you but it would feel even more hack'ish to me because then I have some magic number in my code that originates from some implementation detail deep inside grpc-java. So I really want to avoid that.
As a library user I expect awaitTermination
to do its job and not cause any asynchronous errors. Currently our automated error reporting ships this exception regularly from multiple app server instances. So it is quite annoying.
@ejona86 could you comment on the above?
Same situation but different exception(s):
06.01.2023 11:28:11.536 | grpc-default-worker-ELG-1-2 | ERROR i.g.n.s.i.n.u.c.D.rejectedExecution
Failed to submit a listener notification task. Event loop shut down?
java.lang.ClassNotFoundException: io.grpc.netty.shaded.io.netty.util.concurrent.GlobalEventExecutor$2
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:511)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 11 common frames omitted
Wrapped by: java.lang.NoClassDefFoundError: io/grpc/netty/shaded/io/netty/util/concurrent/GlobalEventExecutor$2
at io.grpc.netty.shaded.io.netty.util.concurrent.GlobalEventExecutor.startThread(GlobalEventExecutor.java:217)
at io.grpc.netty.shaded.io.netty.util.concurrent.GlobalEventExecutor.execute(GlobalEventExecutor.java:205)
at io.grpc.netty.shaded.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
at io.grpc.netty.shaded.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:499)
at io.grpc.netty.shaded.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at io.grpc.netty.shaded.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
at io.grpc.netty.shaded.io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:1048)
at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:832)
Just before the above exception the exception below is logged several times as WARN
06.01.2023 11:28:11.535 | grpc-default-worker-ELG-1-8 | WARN i.g.n.s.i.n.u.c.SingleThreadEventExecutor
Unexpected exception from an event executor:
java.lang.NoClassDefFoundError: io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$2
at io.grpc.netty.shaded.io.netty.util.collection.IntObjectHashMap.values(IntObjectHashMap.java:221)
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.closeAll(EpollEventLoop.java:433)
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:401)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:832)
Has any work been done in newer GRPC versions to prevent that behavior and to make awaitTermination
work as expected?
Thanks @ejona86 . I see similarities with #8309. The comment #8309 (comment) might apply here i.e. "I suspect that if you add a Thread.sleep(1100) after the await termination the error will go away"
Would you accept a PR that adds a Thread.sleep(1100)
(or a bit higher) to awaitTermination
as a workaround for this issue until a proper solution has been developed?
Would you accept a PR that adds a Thread.sleep(1100) (or a bit higher) to awaitTermination
No, because it isn't a per-Channel problem. If you create 100 channels, the sleep would only do something for the last Channel being shut down. Unless you were creating another Channel right after that.
I would accept someone filing a bug with Jetty that their ClassLoader prevents shutdown. Breaking the ClassLoader is such a weak protection that it is basically useless, yet it is causing breakages.
I would accept someone filing a bug with Jetty that their ClassLoader prevents shutdown. Breaking the ClassLoader is such a weak protection that it is basically useless, yet it is causing breakages.
Well I don't think there is any Jetty issue. Jetty sees that the web application has successfully shutdown and thus destroys the ClassLoader for that web application.
I don't know how grpc-java works internally but if you say that it is not a per-Channel problem then I assume that you start Netty once for all channels created and even if all channels have been terminated using awaitTermination
Netty is still running for some time? Maybe grpc-java then needs some additional shutdown methods (or maybe they already exist?) that only returns if the whole Netty server with all GRPC channels is completely shutdown?
I've looked enough at Jetty's WebAppClassLoader that I don't think there's much to change there. I thought there was something done on purpose here, but instead I now think they are probably just deleting the wars/jars and that causes the problem.
On the grpc side, this is because we share the Netty threads across channels/servers. That by itself might be okay. But to avoid rapidly re-creating the resources if the reference count flaps rapidly between 1 and 0, we delay the shutdown by a second.
What version of gRPC-Java are you using?
io.grpc:grpc-netty-shaded:1.46.0
What is your environment?
macOS / Linux / JDK 15 / Jetty 10
What did you expect to see?
No exception being thrown after a graceful shutdown. Graceful shutdown happens during undeployment of web application.
What did you see instead?
The log indicates that graceful shutdown has succeeded and consequently the web application has been undeployed. However, roughly a second later Netty throws
NoClassDefFoundException
seen below. Seems likechannel.awaitTermination()
returns too early.Steps to reproduce the bug
Use a servlet container, e.g. Jetty, and register a
ServletContextListener
that starts and stops a GRPC client to call an external service. When undeploying the web application, the exception is sometimes thrown.