spring-projects / spring-boot

Spring Boot helps you to create Spring-powered, production-grade applications and services with absolute minimum fuss.
https://spring.io/projects/spring-boot
Apache License 2.0
75.22k stars 40.7k forks source link

NoClassDefFoundError can be thrown from LaunchedClassLoader when threads are interrupted #40096

Closed refeccd closed 7 months ago

refeccd commented 7 months ago

I am running SpringBoot 3.2.4 on jdk21. I wrote a startup script that stops the application gracefully and restarts it While stopping the application, the following error is reported

17:52:42.567  WARN nHook .c.s.DefaultLifecycleProcessor: Failed to stop bean 'webServerGracefulShutdown'

java.lang.NoClassDefFoundError: org/springframework/boot/web/server/GracefulShutdownCallback
    at org.springframework.boot.web.context.WebServerGracefulShutdownLifecycle.stop(WebServerGracefulShutdownLifecycle.java:62)
    at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:348)
    at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:487)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:317)
    at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:218)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1139)
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:174)
    at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1093)
    at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: org.springframework.boot.web.server.GracefulShutdownCallback
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:104)
    at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
    ... 13 common frames omitted

17:53:12.571  INFO nHook .c.s.DefaultLifecycleProcessor: Shutdown phase 2147482623 ends with 1 bean still running after timeout of 30000ms: [webServerGracefulShutdown]
17:53:12.577  WARN nHook .c.s.DefaultLifecycleProcessor: Failed to stop bean 'webServerStartStop'

java.lang.NoClassDefFoundError: org/apache/catalina/Lifecycle$SingleUse
    at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:254)
    at org.apache.catalina.core.StandardService.removeConnector(StandardService.java:298)
    at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.lambda$removeServiceConnectors$1(TomcatWebServer.java:171)
    at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.doWithConnectors(TomcatWebServer.java:190)
    at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.removeServiceConnectors(TomcatWebServer.java:168)
    at org.springframework.boot.web.embedded.tomcat.TomcatWebServer.stop(TomcatWebServer.java:354)
    at org.springframework.boot.web.servlet.context.WebServerStartStopLifecycle.stop(WebServerStartStopLifecycle.java:53)
    at org.springframework.context.SmartLifecycle.stop(SmartLifecycle.java:117)
    at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:348)
    at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:487)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:317)
    at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:218)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1139)
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:174)
    at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1093)
    at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: org.apache.catalina.Lifecycle$SingleUse
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:104)
    at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
    ... 20 common frames omitted

17:53:42.577  INFO nHook .c.s.DefaultLifecycleProcessor: Shutdown phase 2147481599 ends with 1 bean still running after timeout of 30000ms: [webServerStartStop]
17:53:42.578  WARN nHook .c.s.DefaultLifecycleProcessor: Failed to stop bean 'redisConnectionFactoryVirtualThreads'

java.lang.NoClassDefFoundError: org/springframework/data/util/Optionals
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.lambda$resetConnection$1(LettuceConnectionFactory.java:1135)
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.lambda$doInLock$20(LettuceConnectionFactory.java:1459)
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1469)
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1458)
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.resetConnection(LettuceConnectionFactory.java:1133)
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.stop(LettuceConnectionFactory.java:901)
    at org.springframework.context.SmartLifecycle.stop(SmartLifecycle.java:117)
    at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:348)
    at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:487)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:317)
    at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:218)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1139)
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:174)
    at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1093)
    at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: org.springframework.data.util.Optionals
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:104)
    at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
    ... 19 common frames omitted

17:54:12.579  INFO nHook .c.s.DefaultLifecycleProcessor: Shutdown phase 0 ends with 1 bean still running after timeout of 30000ms: [redisConnectionFactoryVirtualThreads]
17:54:12.582  WARN nHook .s.b.f.s.DisposableBeanAdapter: Custom destroy method 'shutdown' on bean with name 'lettuceClientResources' propagated an exception: java.lang.NoClassDefFoundError: io/netty/util/concurrent/ImmediateEventExecutor
17:54:12.582  INFO nHook c.j.c.c.c.CanalEventFetcher   : Canal fetcher ad_order_info ready to stop
17:54:12.582  INFO nHook c.j.c.c.c.CanalEventFetcher   : Canal fetcher ad_order_info stop normally
17:54:12.585  INFO nHook c.z.hikari.HikariDataSource   : HikariPool-1 - Shutdown initiated...
17:54:12.589  INFO nHook c.z.hikari.HikariDataSource   : HikariPool-1 - Shutdown completed.

Here is the start script

/bin/bash /bin/bash

APP_NAME="crm"
JAR_NAME="${APP_NAME}.jar"
TARGET_DIR="/data/${APP_NAME}"
JAR_PATH="${TARGET_DIR}/package.tgz"
PID_FILE="${TARGET_DIR}/${APP_NAME}.pid"
curtime() {
    date +'%Y-%m-%d %H:%M:%S'
}
kill_process() {
    if [ -f "$PID_FILE" ]; then
        pid=$(cat "$PID_FILE")
    elif pgrep -f "$JAR_NAME" >/dev/null; then
        pid=$(pgrep -f "$JAR_NAME")
    pid=$(pgrep -f "$JAR_NAME")
    echo "$(curtime) found old $JAR_NAME process, PID: $pid"
    if [ -n "$pid" ]; then
        kill "$pid"
        while ps -p "$pid" >/dev/null; do sleep 1; done
        rm -rf "$PID_FILE"
        echo "$(curtime) old process stopped"
    fi
}
tar xzf "$JAR_PATH" -C "$TARGET_DIR" --overwrite > /dev/null
kill_process
nohup java -XX:-OmitStackTraceInFastThrow -XX:+HeapDumpOnOutOfMemoryError -jar "$TARGET_DIR/$JAR_NAME" --spring.profiles.active=test > / dev/null 2>&1 &
new_pid=$!
echo $new_pid > "$PID_FILE"
echo "$(curtime) New process started, PID: $new_pid"
wilkinsona commented 7 months ago

Thanks for the report. Unfortunately, as things stand, there are too many unknowns for us to efficiently investigate the problem. Can you please provide a complete example that reproduces it?

refeccd commented 7 months ago

I also wish I could provide an EXAMPLE, unfortunately this error it is not 100% reproducible. As it stands, the problem is more likely to occur when the application is running a little longer. If it's just restarted right after a short period of time, it's hardly ever reproducible!

refeccd commented 7 months ago

I'll try to see if I can provide a 100% reproduction of the demo Is there anything else I should provide other than a demo?

wilkinsona commented 7 months ago

I'm afraid we really need a demo as it's very hard to diagnose a problem that we cannot reproduce.

refeccd commented 7 months ago

Okay, I'll try to provide a minimal reproduction of the example

refeccd commented 7 months ago

I did my best to try to provide an example that could be reproduced consistently, but it is completely unreproducible on my own computer. But on CentOS 7.9, it will fail to terminate properly without any regularity

refeccd commented 7 months ago

I think my problem is very similar to #38930, the difference is that I'm using the latest SpringBoot 3.2.4

wilkinsona commented 7 months ago

Yes, it does look similar to #39930. Unfortunately, that doesn't help us here as no reproducer was provided there either. The class loading failures would suggest that either the class loader has been closed (that should not have happened at this point in the shutdown process) or the jar file has been deleted while the app is running. The latter's out of Spring Boot's control and, judging by your script above, should not be happening here. That leaves the class loader being closed too soon. To confirm or eliminate that possibility, I'm afraid we'll need something that reproduces the problem.

jocull commented 7 months ago

I am seeing a similar problem in an application that interrupts threads when workers need to rebalance. I am highly suspicious of https://github.com/spring-projects/spring-boot/issues/37668 but don't know where to start.

Is it possible that the interrupts are hitting the classloaders and breaking definition lookups? It would be a similar case to what is described here: https://stackoverflow.com/q/56866573/97964

How can I help trace this further?

jocull commented 7 months ago

I have reproduce this locally with our application, but I cannot share the internal business code that will do it. Here is the gist of the operations in the meantime:

  1. Create a pool of threads. Get them doing some work that will require class loading (maybe they make web requests or something?)
  2. Interrupt them all (like Thread.interrupt()). You probably need a lot of class loading to be happening to catch it at the right point in time.
  3. Give them new jobs - classes will fail to load with various exceptions leading to the app being permanently unhealthy
  4. Repeat this in some kind of tight loop, maybe once per second you restart all the jobs.

As another point of information, using the "classic" class loader seems to alleviate the issue on initial testing. I still have to verify this further to be certain that it's improved.

philwebb commented 7 months ago

Thanks very much for the extra details @jocull. The Thread.interrupt() details make me wonder if it's a similar underling cause as #38611

HamzaAkbar067 commented 7 months ago

Yes, it appears like of #39930. However, we will get some help if it reproduce again.

philwebb commented 7 months ago

I think I can see why this can happen, but I've not been able to replicate it. We're going to try changing the underlying way that we access file data.

philwebb commented 7 months ago

@jocull @liaozan I've just pushed something I hope will fix this. If you have the chance, please try the SNAPSHOT before the Thursday release.

mhalbritter commented 2 weeks ago

Please don't cross post. You've made the same comment in https://github.com/spring-projects/spring-boot/issues/42923#issuecomment-2446361233.