spring-projects / spring-framework

Spring Framework
https://spring.io/projects/spring-framework
Apache License 2.0
56.42k stars 38.07k forks source link

Deadlock in AbstractApplicationContext at JVM shutdown (SIGTERM) #23438

Closed mathieucarbou closed 5 years ago

mathieucarbou commented 5 years ago

Affects: 5.0.12 (sure) and probably 5.1.9 since we tried to upgrade and this is the same code.


There is a deadlock at JVM shutdown if a thread tries to close the spring context at the same time.

The issue is a lock in AbstractApplicationContext

    @Override
    public void registerShutdownHook() {
        if (this.shutdownHook == null) {
            // No shutdown hook registered yet.
            this.shutdownHook = new Thread() {
                @Override
                public void run() {
                    synchronized (startupShutdownMonitor) {
                        doClose();
                    }
                }
            };
            Runtime.getRuntime().addShutdownHook(this.shutdownHook);
        }
    }
    @Override
    public void close() {
        synchronized (this.startupShutdownMonitor) {
            doClose();
            // If we registered a JVM shutdown hook, we don't need it anymore now:
            // We've already explicitly closed the context.
            if (this.shutdownHook != null) {
                try {
                    Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
                }
                catch (IllegalStateException ex) {
                    // ignore - VM is already shutting down
                }
            }
        }
    }

Here is an extract of the thread dump:

"SIGTERM handler" #354 daemon prio=9 os_prio=0 tid=0x00007f5d38004800 nid=0x2bb4c waiting for monitor entry [0x00007f5ae66e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.Shutdown.exit(Shutdown.java:212)
    - waiting to lock <0x00000005cc3bde18> (a java.lang.Class for java.lang.Shutdown)
    at java.lang.Terminator$1.handle(Terminator.java:52)
    at sun.misc.Signal$1.run(Signal.java:212)
    at java.lang.Thread.run(Thread.java:748)

"Thread-33" #215 prio=5 os_prio=0 tid=0x00007f5a20002800 nid=0x2bb20 in Object.wait() [0x00007f5ccbbf9000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:460)
    at com.tc.object.InFlightMessage.timedWait(InFlightMessage.java:294)
    - locked <0x00000007534e0638> (a com.tc.object.InFlightMessage)
    at com.tc.object.InFlightMessage.waitForAcks(InFlightMessage.java:239)
    at com.tc.object.InFlightMessage.waitForAcks(InFlightMessage.java:222)
    at com.tc.object.ClientEntityManagerImpl.queueInFlightMessage(ClientEntityManagerImpl.java:566)
    at com.tc.object.ClientEntityManagerImpl.sendMessageWhileBusy(ClientEntityManagerImpl.java:525)
    at com.tc.object.ClientEntityManagerImpl.internalRelease(ClientEntityManagerImpl.java:505)
    at com.tc.object.ClientEntityManagerImpl.access$600(ClientEntityManagerImpl.java:82)
    at com.tc.object.ClientEntityManagerImpl$1.run(ClientEntityManagerImpl.java:467)
    at com.tc.object.EntityClientEndpointImpl.close(EntityClientEndpointImpl.java:296)
    at org.terracotta.voltron.proxy.client.VoltronProxyInvocationHandler.invoke(VoltronProxyInvocationHandler.java:124)
    at com.sun.proxy.$Proxy128.close(Unknown Source)
    at org.terracotta.management.entity.nms.client.DefaultNmsService.close(DefaultNmsService.java:112)
    at com.terracottatech.management.voltron.AggregateNmsService.lambda$close$0(AggregateNmsService.java:126)
    at com.terracottatech.management.voltron.AggregateNmsService$$Lambda$1269/551340504.accept(Unknown Source)
    at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
    at com.terracottatech.management.voltron.AggregateNmsService.close(AggregateNmsService.java:123)
    at com.terracottatech.management.voltron.VoltronTmsConnection.doClose(VoltronTmsConnection.java:181)
    at com.terracottatech.management.cluster.api.AbstractTmsConnection.close(AbstractTmsConnection.java:88)
    at com.terracottatech.management.cluster.api.AbstractTmsConnectionFactoryService.close(AbstractTmsConnectionFactoryService.java:42)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.beans.factory.support.DisposableBeanAdapter.invokeCustomDestroyMethod(DisposableBeanAdapter.java:337)
    at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:271)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:571)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:543)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:954)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:504)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons(DefaultListableBeanFactory.java:961)
    at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1039)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1015)
    at org.springframework.context.support.AbstractApplicationContext$1.run(AbstractApplicationContext.java:935)
    - locked <0x00000005cc513e08> (a java.lang.Object)

"SIGTERM handler" #350 daemon prio=9 os_prio=0 tid=0x00007f5d38001000 nid=0x2bb08 in Object.wait() [0x00007f5ccbdfc000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000005cd116528> (a org.springframework.context.support.AbstractApplicationContext$1)
    at java.lang.Thread.join(Thread.java:1252)
    - locked <0x00000005cd116528> (a org.springframework.context.support.AbstractApplicationContext$1)
    at java.lang.Thread.join(Thread.java:1326)
    at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:107)
    at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
    at java.lang.Shutdown.runHooks(Shutdown.java:123)
    at java.lang.Shutdown.sequence(Shutdown.java:167)
    at java.lang.Shutdown.exit(Shutdown.java:212)
    - locked <0x00000005cc3bde18> (a java.lang.Class for java.lang.Shutdown)
    at java.lang.Terminator$1.handle(Terminator.java:52)
    at sun.misc.Signal$1.run(Signal.java:212)
    at java.lang.Thread.run(Thread.java:748)
mathieucarbou commented 5 years ago

@anthonydahanne @chrisdennis FYI

lgxbslgx commented 5 years ago

@mathieucarbou Is the class com.tc.object.InFlightMessage your own code? Maybe something in InFlightMessage cause this problem. Could you give more information about it?

mathieucarbou commented 5 years ago

InFlightMessage has nothing to do with this. This class is part of another library we use (this is a comm. stack). Basically what happens is that Thread-33 tries to close Spring application context. While on the other hand the JVM SIGTERM handler thread is running the shutdown hook to exist the JVM.

There is no issue in thread Thread-33. The issue is on lock 0x00000005cc3bde18 caused by the synchronized blocks.

The close() method in AbstractApplicationContext could be called at the same time the JVM shutdown hook is triggered.

chenghaoharvey commented 5 years ago

hi, i guess there is something wrong on lock "0x00000005cd116528". A thread join itself will generate a deadlock. I have a demo deadlock program showed below.

public class Starter {
        public static void main(String[] args) {
             new DeadThread().start();
    }
}

class DeadThread {
    public DeadThread() {
        thread = new Thread(new RealThread());
    }
    public void start() {
        thread.start();
    }
    private Thread thread;
    private class RealThread implements Runnable{
        public void run() {
            System.out.println("Thread is starting...");
            try {
                    thread.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("Thread is stopping...");
        }
    }
}

And stack dump is showed below:

"Thread-0" #9 prio=5 os_prio=31 tid=0x00007f9ae089a000 nid=0x4403 in Object.wait() [0x000070000fcf6000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000079561e6a0> (a java.lang.Thread)
    at java.lang.Thread.join(Thread.java:1252)
    - locked <0x000000079561e6a0> (a java.lang.Thread)
    at java.lang.Thread.join(Thread.java:1326)
    at concurrencys.DeadThread$RealThread.run(Starter.java:21)
    at java.lang.Thread.run(Thread.java:748).
lgxbslgx commented 5 years ago

@chenghaoharvey I try the code below.

public class Starter {
    public static void main(String[] args) {
        DeadThread thread = new DeadThread();
        thread.start();
        try {
            thread.thread.join();
        }
        catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

class DeadThread {
    public DeadThread() {
        thread = new Thread(new RealThread());
    }
    public void start() {
        thread.start();
    }
    Thread thread;
    private class RealThread implements Runnable{
        public void run() {
            System.out.println("Thread is starting...");
            try {
                Thread.sleep(200000);
            }
            catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("Thread is stopping...");
        }
    }
}

When the child thread is sleeping, the stack of the main thread is shown below.

"main" #1 prio=5 os_prio=0 tid=0x0000000002814800 nid=0x86c in Object.wait() [0x000000000226f000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x0000000780edfb90> (a java.lang.Thread)
    at java.lang.Thread.join(Thread.java:1252)
    - locked <0x0000000780edfb90> (a java.lang.Thread)
    at java.lang.Thread.join(Thread.java:1326)
    at com.test.issue23438.Starter.main(Starter.java:8)

So I think it has nothing to do with your guess. This stack only means the thread is waiting.

lgxbslgx commented 5 years ago

@mathieucarbou What I know now is that the thread tid=0x00007f5d38004800 is waiting the thread tid=0x00007f5d38001000 to release lock 0x00000005cc3bde18 and the thread tid=0x00007f5d38001000 is waiting the thread tid=0x00007f5a20002800 to release lock 0x00000005cd116528. But I don't know what the thread tid=0x00007f5a20002800 is waiting for.

Could you provide an example which uses as few external classes as possible?

mathieucarbou commented 5 years ago

I'll see if I still have the stack. It does not happen frequently. But you are right, the doClose() lock is held by Thread-33 that is waiting. This is a com stack, it seems that the .close(EntityClientEndpointImpl.java:296) never ends.

mathieucarbou commented 5 years ago

after a deeper look, turns out our comm. close() code can block under certain cases. So this is what can cause the closing of the app context stop. I am closing :-) Sorry guys for the time lost!

chenghaoharvey commented 5 years ago

@lgxbslgx
Many thanks for your replying. The different between our programs is that your program will not cause deadlock. However, my program will generate a deadlock because a thread join itself. But you are right, the stack only means the thread is waiting.

"Thread tid=0x00007f5a20002800 to release lock 0x00000005cd116528". May i know if thread tid=0x00007f5a20002800 held lock 0x00000005cd116528? Because the stack did not show it.

lgxbslgx commented 5 years ago

@chenghaoharvey The AbstractApplicationContext$1 in the stack means the thread is waiting for AbstractApplicationContext$1. The stack snippet is shown below.

    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000005cd116528> (a org.springframework.context.support.AbstractApplicationContext$1)
    at java.lang.Thread.join(Thread.java:1252)
    - locked <0x00000005cd116528> (a org.springframework.context.support.AbstractApplicationContext$1)