spring-attic / spring-security-saml

SAML extension for the Spring Security project
Other
419 stars 484 forks source link

MetadataManager blocking threads #498

Closed SanthoshEnricher closed 3 years ago

SanthoshEnricher commented 3 years ago

We are using spring-security-saml2-core with version 1.0.3.RELEASE I could see a deadlock problem is solved with http://stackoverflow.com/questions/30324200/metadata-refresh-deadlock-spring-security-saml

logs: Metadata-reload priority:5 - threadId:0x00007f3c2c159000 - nativeId:0x126b6 - nativeId (decimal):75446 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at org.opensaml.saml2.metadata.provider.AbstractMetadataProvider.initialize(AbstractMetadataProvider.java:402)

jzheaux commented 3 years ago

Hi, @SanthoshEnricher, I'm sorry to hear you are experiencing problems.

The linked SO is related to an issue that was fixed in 1.0.2, so it's not clear this is the same problem.

With a deadlock there is typically something else holding onto the lock that this thread is waiting on. Can you please provide more detail along these lines?

SanthoshEnricher commented 3 years ago

Hi @jzheaux , Thank you for response. I have added thread dump logs below. Please try to check with it. I am trying to upgrade saml core version to 1.0.10.RELEASE

"http-nio-9502-exec-290" #43194 daemon prio=5 os_prio=0 tid=0x00007f3bbc002800 nid=0x1570c waiting on condition [0x00007f3c816ec000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-289" #35943 daemon prio=5 os_prio=0 tid=0x00007f3bc000f800 nid=0x1da14 waiting on condition [0x00007f3b9abf1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-288" #35942 daemon prio=5 os_prio=0 tid=0x00007f3bc0002800 nid=0x1da13 waiting on condition [0x00007f3c810e6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-287" #35941 daemon prio=5 os_prio=0 tid=0x00007f3bc0004800 nid=0x1da0e waiting on condition [0x00007f3c803dd000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-285" #35939 daemon prio=5 os_prio=0 tid=0x00007f3bbc001000 nid=0x1da0b waiting on condition [0x00007f3c814ea000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-284" #35936 daemon prio=5 os_prio=0 tid=0x00007f3bc0006800 nid=0x1d9c2 waiting on condition [0x00007f3b9aff3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-282" #35935 daemon prio=5 os_prio=0 tid=0x00007f3bc0008800 nid=0x1d9c1 waiting on condition [0x00007f3b9aef2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-281" #35933 daemon prio=5 os_prio=0 tid=0x00007f3bbc00f000 nid=0x1d9bf waiting on condition [0x00007f3c802dc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-276" #33240 daemon prio=5 os_prio=0 tid=0x00007f3bbc006000 nid=0xff75 waiting on condition [0x00007f3c811e7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-275" #33236 daemon prio=5 os_prio=0 tid=0x00007f3bbc00b000 nid=0xfe9a waiting on condition [0x00007f3c801db000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"http-nio-9502-exec-238" #20677 daemon prio=5 os_prio=0 tid=0x00007f3bbc014000 nid=0xd234 waiting on condition [0x00007f3b9aaf0000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"AsyncResolver-bootstrap-executor-0" #77 daemon prio=5 os_prio=0 tid=0x00007f3bf8001000 nid=0x13bb8 waiting on condition [0x00007f3b9bffe000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"kafka-producer-network-thread | producer-4" #76 daemon prio=5 os_prio=0 tid=0x00007f3bcc6f7000 nid=0x13601 runnable [0x00007f3b9b7f8000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"elasticsearch[Wildboys][listener][T#4]" #75 daemon prio=5 os_prio=0 tid=0x00007f3bd43fc800 nid=0x135f2 waiting on condition [0x00007f3b9baf9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][listener][T#3]" #74 daemon prio=5 os_prio=0 tid=0x00007f3bd0003800 nid=0x135c2 waiting on condition [0x00007f3b9bbfa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][listener][T#2]" #73 daemon prio=5 os_prio=0 tid=0x00007f3b94025800 nid=0x1359a waiting on condition [0x00007f3b9bcfb000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][listener][T#1]" #66 daemon prio=5 os_prio=0 tid=0x00007f3bcc18b000 nid=0x13515 waiting on condition [0x00007f3c806e1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"DiscoveryClient-2" #65 daemon prio=5 os_prio=0 tid=0x00007f3bec002000 nid=0x134d0 waiting on condition [0x00007f3c807e2000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"DiscoveryClient-HeartbeatExecutor-0" #64 daemon prio=5 os_prio=0 tid=0x00007f3bf0001000 nid=0x134cf waiting on condition [0x00007f3c808e3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"DiscoveryClient-CacheRefreshExecutor-0" #63 daemon prio=5 os_prio=0 tid=0x00007f3bec001000 nid=0x134ce waiting on condition [0x00007f3c825f8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][management][T#2]" #62 daemon prio=5 os_prio=0 tid=0x00007f3c1000d800 nid=0x134c7 waiting on condition [0x00007f3c809e4000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"DestroyJavaVM" #61 prio=5 os_prio=0 tid=0x00007f3cb0008000 nid=0x12141 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"http-nio-9502-AsyncTimeout" #59 daemon prio=5 os_prio=0 tid=0x00007f3cb1d90000 nid=0x12f6d waiting on condition [0x00007f3c80ce5000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.coyote.AbstractProtocol$AsyncTimeout.run(AbstractProtocol.java:1200) at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:

"http-nio-9502-Acceptor-0" #58 daemon prio=5 os_prio=0 tid=0x00007f3cb3616000 nid=0x12f6c runnable [0x00007f3c80de6000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:419) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:247)

"http-nio-9502-ClientPoller-1" #57 daemon prio=5 os_prio=0 tid=0x00007f3cb3614000 nid=0x12f0c runnable [0x00007f3c80ee7000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"http-nio-9502-ClientPoller-0" #56 daemon prio=5 os_prio=0 tid=0x00007f3cb15c0800 nid=0x12f0b runnable [0x00007f3c80fe8000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"NioBlockingSelector.BlockPoller-1" #45 daemon prio=5 os_prio=0 tid=0x00007f3cb17b5000 nid=0x12f00 runnable [0x00007f3c89331000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"DiscoveryClient-InstanceInfoReplicator-0" #44 daemon prio=5 os_prio=0 tid=0x00007f3cb2218000 nid=0x12efe waiting on condition [0x00007f3c819f2000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"DiscoveryClient-1" #43 daemon prio=5 os_prio=0 tid=0x00007f3cb1830800 nid=0x12efd waiting on condition [0x00007f3c81af3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"DiscoveryClient-0" #42 daemon prio=5 os_prio=0 tid=0x00007f3cb28e2800 nid=0x12efc waiting on condition [0x00007f3c81bf4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"AsyncResolver-bootstrap-0" #41 daemon prio=5 os_prio=0 tid=0x00007f3cb2dfa000 nid=0x12efb waiting on condition [0x00007f3c81ef5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"Eureka-JerseyClient-Conn-Cleaner2" #40 daemon prio=5 os_prio=0 tid=0x00007f3cb3a46800 nid=0x12efa waiting on condition [0x00007f3c81ff6000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"Eureka-JerseyClient-Conn-Cleaner2" #39 daemon prio=5 os_prio=0 tid=0x00007f3cb3a46000 nid=0x12ef9 waiting on condition [0x00007f3c820f7000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][generic][T#1]" #37 daemon prio=5 os_prio=0 tid=0x00007f3c0c0da000 nid=0x12d42 waiting on condition [0x00007f3c826f9000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][management][T#1]" #36 daemon prio=5 os_prio=0 tid=0x00007f3c0c0b6000 nid=0x12d3e waiting on condition [0x00007f3c827fa000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][clusterService#updateTask][T#1]" #35 daemon prio=5 os_prio=0 tid=0x00007f3cb1191000 nid=0x12d31 waiting on condition [0x00007f3c828fb000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][[ttl_expire]]" #34 daemon prio=5 os_prio=0 tid=0x00007f3cb2c5a800 nid=0x12d24 waiting on condition [0x00007f3c829fc000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][scheduler][T#1]" #33 daemon prio=5 os_prio=0 tid=0x00007f3cb2c7a800 nid=0x12cf4 waiting on condition [0x00007f3c82cfd000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"elasticsearch[Wildboys][[timer]]" #32 daemon prio=5 os_prio=0 tid=0x00007f3cb3629800 nid=0x12bf0 waiting on condition [0x00007f3c902f5000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:747)

Locked ownable synchronizers:

"spring.cloud.inetutils" #31 daemon prio=5 os_prio=0 tid=0x00007f3cb05b2000 nid=0x12ada waiting on condition [0x00007f3c83dfe000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"commons-pool-EvictionTimer" #29 daemon prio=5 os_prio=0 tid=0x00007f3cb3b91800 nid=0x129da in Object.wait() [0x00007f3c8812d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:552)

"container-0" #27 prio=5 os_prio=0 tid=0x00007f3cb11b3800 nid=0x128d8 waiting on condition [0x00007f3c8882e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.catalina.core.StandardServer.await(StandardServer.java:427) at org.springframework.boot.context.embedded.tomcat.TomcatEmbeddedServletContainer$1.run(TomcatEmbeddedServletContainer.java:177)

Locked ownable synchronizers:

"ContainerBackgroundProcessor[StandardEngine[Tomcat]]" #26 daemon prio=5 os_prio=0 tid=0x00007f3cb08a2000 nid=0x128d7 waiting on condition [0x00007f3c8892f000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1355) at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:

"Metadata-reload" #25 daemon prio=5 os_prio=0 tid=0x00007f3c2c159000 nid=0x126b6 waiting for monitor entry [0x00007f3c89030000] java.lang.Thread.State: BLOCKED (on object monitor) at org.opensaml.saml2.metadata.provider.AbstractMetadataProvider.initialize(AbstractMetadataProvider.java:402)

"Timer-1" #22 daemon prio=5 os_prio=0 tid=0x00007f3cb0ce2000 nid=0x1251e runnable [0x00007f3c8a3f1000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:990)

"Timer-0" #21 daemon prio=5 os_prio=0 tid=0x00007f3cb1588000 nid=0x124cf in Object.wait() [0x00007f3c8a4f3000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:552)

"Abandoned connection cleanup thread" #20 daemon prio=5 os_prio=0 tid=0x00007f3cb0cc5800 nid=0x124a8 in Object.wait() [0x00007f3c901f4000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)

"kafka-producer-network-thread | producer-3" #18 daemon prio=5 os_prio=0 tid=0x00007f3cb12a7000 nid=0x1239f runnable [0x00007f3c8b3f4000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"Service Thread" #9 daemon prio=9 os_prio=0 tid=0x00007f3cb0229800 nid=0x1217f runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C1 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f3cb020c000 nid=0x1217e waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f3cb020a000 nid=0x1217d waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f3cb0208800 nid=0x1217c waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f3cb0205800 nid=0x1217b waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f3cb0204000 nid=0x1217a runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f3cb01d3800 nid=0x12165 in Object.wait() [0x00007f3c910c1000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f3cb01cf000 nid=0x12163 in Object.wait() [0x00007f3c911c2000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191)

"VM Thread" os_prio=0 tid=0x00007f3cb01c5800 nid=0x12162 runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f3cb001d800 nid=0x1214a runnable

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f3cb001f800 nid=0x1214b runnable

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f3cb0021000 nid=0x1214f runnable

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f3cb0023000 nid=0x12150 runnable

"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f3cb0024800 nid=0x12151 runnable

"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f3cb0026800 nid=0x12152 runnable

"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f3cb0028000 nid=0x12153 runnable

"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f3cb002a000 nid=0x12154 runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00007f3cb022c000 nid=0x12180 waiting on condition

JNI global references: 288

jzheaux commented 3 years ago

Thanks, @SanthoshEnricher.

These lines seem relevant from "metadata-reload":

at org.opensaml.saml2.metadata.provider.AbstractMetadataProvider.initialize(AbstractMetadataProvider.java:402)
- waiting to lock <0x00000000e0e1aaf0> (a org.opensaml.saml2.metadata.provider.HTTPMetadataProvider)

and "Timer-1":

org.opensaml.saml2.metadata.provider.AbstractReloadingMetadataProvider.refresh(AbstractReloadingMetadataProvider.java:255)
- locked <0x00000000e0e1aaf0> (a org.opensaml.saml2.metadata.provider.HTTPMetadataProvider)

It may be worth looking at why you've got two refresh tasks running to refresh the metadata.

I'm going to close this as answered, but if you need further help, please post a question on Stack Overflow and link that back to this ticket. In the future, Stack Overflow is usually a better venue for questions like these.

SanthoshEnricher commented 3 years ago

Thanks, @SanthoshEnricher.

These lines seem relevant from "metadata-reload":

at org.opensaml.saml2.metadata.provider.AbstractMetadataProvider.initialize(AbstractMetadataProvider.java:402)
- waiting to lock <0x00000000e0e1aaf0> (a org.opensaml.saml2.metadata.provider.HTTPMetadataProvider)

and "Timer-1":

org.opensaml.saml2.metadata.provider.AbstractReloadingMetadataProvider.refresh(AbstractReloadingMetadataProvider.java:255)
- locked <0x00000000e0e1aaf0> (a org.opensaml.saml2.metadata.provider.HTTPMetadataProvider)

It may be worth looking at why you've got two refresh tasks running to refresh the metadata.

I'm going to close this as answered, but if you need further help, please post a question on Stack Overflow and link that back to this ticket. In the future, Stack Overflow is usually a better venue for questions like these.

Thank you