juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.62k stars 928 forks source link

juicefs[919684] <WARNING>: Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] #5193

Open ANHDY opened 1 day ago

ANHDY commented 1 day ago

What happened: 当我运行分布式测试命令”hadoop jar ./juicefs-hadoop-1.2.0.jar dfsio -read -files 50 -size 1MB -bufferSize 1048576 -baseDir jfs://jfs/tmp/benchmarks/DFSIO“时,出现该警告循环,不会自动结束进程,而且每次都是在map 67%时出现该警告 执行-create也会出现该情况

INFO mapreduce.Job: map 0% reduce 0% [24/09/25 21:32:14 INFO mapreduce.Job: map 67% reduce 0% 2024/09/25 21:33:19.693335 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv .go:557] 24/09/25 21:33:19 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:33:19.693335 juicefs[919684] : Session 613 was st ale and cleaned up, but now it comes back again [tkv.go:557] 2024/09/25 21:33:55.563520 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv .go:557] 24/09/25 21:33:55 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:33:55.563520 juicefs[919684] : Session 613 was st ale and cleaned up, but now it comes back again [tkv.go:557] 2024/09/25 21:34:08.143597 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv .go:557] 24/09/25 21:34:08 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:34:08.143597 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 2024/09/25 21:34:32.600125 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 24/09/25 21:34:32 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:34:32.600125 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 2024/09/25 21:35:56.818263 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 24/09/25 21:35:56 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:35:56.818263 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 2024/09/25 21:36:08.917166 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 24/09/25 21:36:08 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:36:08.917166 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 2024/09/25 21:36:20.683481 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557] 24/09/25 21:36:20 WARN juicefs.JuiceFileSystemImpl: 2024/09/25 21:36:20.683481 juicefs[919684] : Session 613 was stale and cleaned up, but now it comes back again [tkv.go:557]

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

每次运行 ”hadoop jar ./juicefs-hadoop-1.2.0.jar dfsio -read -files 50 -size 1MB -bufferSize 1048576 -baseDir jfs://jfs/tmp/benchmarks/DFSIO“ 这个命令都会复现

Anything else we need to know?

Environment:

polyrabbit commented 1 day ago

有没有可能你们集群中有个节点时钟比别的节点快,他一直在清理 他认为 过期了的 session?可以搜搜其他节点的日志看看有没有清 session 的消息。 JuiceFS 默认的参数只能容忍节点间一分钟的时钟误差。

ANHDY commented 21 hours ago

有没有可能你们集群中有个节点时钟比别的节点快,他一直在清理 他认为 过期了的 session?可以搜搜其他节点的日志看看有没有清 session 的消息。 JuiceFS 默认的参数只能容忍节点间一分钟的时钟误差。

你好,我查看了其它节点的juicefs.access.log日志信息,没有看到清理session的信息,不知道应该查看这个日志。

然后我查看了yarn的日志信息,发现了这个警告:ERROR impl.TimelineWriter (TimelineWriter.java:doPosting(133)) - Failed to get the response from the timeline server. ,不知道是不是因为这个原因。

hadoop-yarn-resourcemanager-nodename.log信息:

2024-09-26 12:30:20,663 WARN  security.LdapGroupsMapping (LdapGroupsMapping.java:getGroups(267)) - Failed to get groups for user hive (retry=0) by javax.naming.CommunicationException: connection closed [Root exception is java.io.IOException: connection closed]; remaining name 'dc=hadoop,dc=apache,dc=org'
2024-09-26 12:31:47,474 INFO  resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(337)) - Allocated new applicationId: 26
2024-09-26 12:31:48,415 INFO  resourcemanager.ClientRMService (ClientRMService.java:submitApplication(644)) - Application with id 26 submitted by user hive
2024-09-26 12:31:48,415 INFO  security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleAppSubmitEvent(450)) - application_1727174811278_0026 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cmss, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/emr-master-001.novalocal@BCHKDC, renewer=yarn, realUser=, issueDate=1727325107498, maxDate=1727929907498, sequenceNumber=384, masterKeyId=94)
2024-09-26 12:31:48,417 ERROR impl.TimelineWriter (TimelineWriter.java:doPosting(133)) - Failed to get the response from the timeline server.
2024-09-26 12:31:48,417 ERROR metrics.TimelineServiceV1Publisher (TimelineServiceV1Publisher.java:putEntity(370)) - Error when publishing entity [YARN_APPLICATION,application_1727174811278_0026]
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. HTTP error code: 403
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:142)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
        at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:176)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.access$100(TimelineServiceV1Publisher.java:52)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:393)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:389)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
2024-09-26 12:31:48,430 INFO  security.DelegationTokenRenewer (DelegationTokenRenewer.java:renewToken(628)) - Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cmss, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/emr-master-001.novalocal@BCHKDC, renewer=yarn, realUser=, issueDate=1727325107498, maxDate=1727929907498, sequenceNumber=384, masterKeyId=94);exp=1727411508432; apps=[application_1727174811278_0026]]
2024-09-26 12:31:48,430 INFO  impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(107)) - Timeline service address: null
2024-09-26 12:31:48,471 INFO  security.DelegationTokenRenewer (DelegationTokenRenewer.java:setTimerForTokenRenewal(606)) - Renew Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cmss, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/emr-master-001.novalocal@BCHKDC, renewer=yarn, realUser=, issueDate=1727325107498, maxDate=1727929907498, sequenceNumber=384, masterKeyId=94);exp=1727411508432; apps=[application_1727174811278_0026] in 86399961 ms, appId = [application_1727174811278_0026]
2024-09-26 12:31:48,471 INFO  rmapp.RMAppImpl (RMAppImpl.java:transition(1261)) - Storing application with id application_1727174811278_0026
2024-09-26 12:31:48,471 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(914)) - application_1727174811278_0026 State change from NEW to NEW_SAVING on event = START
2024-09-26 12:31:48,471 INFO  recovery.RMStateStore (RMStateStore.java:transition(223)) - Storing info for app: application_1727174811278_0026
2024-09-26 12:31:48,479 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(914)) - application_1727174811278_0026 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2024-09-26 12:31:48,485 INFO  fair.FairScheduler (FairScheduler.java:addApplication(511)) - Accepted application application_1727174811278_0026 from user: hive, in queue: root.ambari-qa, currently num of applications: 2
2024-09-26 12:31:48,485 ERROR impl.TimelineWriter (TimelineWriter.java:doPosting(133)) - Failed to get the response from the timeline server.
2024-09-26 12:31:48,485 ERROR metrics.TimelineServiceV1Publisher (TimelineServiceV1Publisher.java:putEntity(370)) - Error when publishing entity [YARN_APPLICATION,application_1727174811278_0026]
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. HTTP error code: 403
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:142)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
        at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:176)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.access$100(TimelineServiceV1Publisher.java:52)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:393)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:389)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
2024-09-26 12:31:48,485 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(914)) - application_1727174811278_0026 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2024-09-26 12:31:48,485 INFO  resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(480)) - Registering app attempt : appattempt_1727174811278_0026_000001
2024-09-26 12:31:48,485 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(921)) - appattempt_1727174811278_0026_000001 State change from NEW to SUBMITTED on event = START
2024-09-26 12:31:48,485 INFO  fair.FairScheduler (FairScheduler.java:addApplicationAttempt(572)) - Added Application Attempt appattempt_1727174811278_0026_000001 to scheduler from user: hive
2024-09-26 12:31:48,486 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(921)) - appattempt_1727174811278_0026_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2024-09-26 12:31:48,992 INFO  rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e92_1727174811278_0026_01_000001 Container Transitioned from NEW to ALLOCATED
2024-09-26 12:31:48,992 INFO  security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : emr-task-3df4-014.novalocal:45454 for container : container_e92_1727174811278_0026_01_000001
2024-09-26 12:31:48,993 INFO  rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e92_1727174811278_0026_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2024-09-26 12:31:48,993 INFO  security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1727174811278_0026_000001

hadoop-yarn-timelineserver-nodename.log信息:

2024-09-26 12:31:39,448 INFO  timeline.RollingLevelDBTimelineStore (RollingLevelDBTimelineStore.java:discardOldEntities(1568)) - Discarded 0 entities for timestamp 1726720299447 and earlier in 0.001 seconds
2024-09-26 12:31:48,391 INFO  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:createPassword(427)) - Creating password for identifier: (TIMELINE_DELEGATION_TOKEN owner=hive, renewer=yarn, realUser=, issueDate=1727325108391, maxDate=1727929908391, sequenceNumber=228, masterKeyId=86), currentKey: 86
2024-09-26 12:31:48,416 WARN  server.AuthenticationFilter (AuthenticationFilter.java:doFilter(549)) - AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
2024-09-26 12:31:48,416 ERROR webapp.TimelineWebServices (TimelineWebServices.java:postEntities(233)) - The owner of the posted timeline entities is not set
2024-09-26 12:31:48,469 INFO  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:renewToken(515)) - Token renewal for identifier: (TIMELINE_DELEGATION_TOKEN owner=hive, renewer=yarn, realUser=, issueDate=1727325108391, maxDate=1727929908391, sequenceNumber=228, masterKeyId=86); total currentTokens 1
2024-09-26 12:31:48,480 WARN  server.AuthenticationFilter (AuthenticationFilter.java:doFilter(549)) - AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
2024-09-26 12:31:48,484 ERROR webapp.TimelineWebServices (TimelineWebServices.java:postEntities(233)) - The owner of the posted timeline entities is not set
2024-09-26 12:31:48,993 WARN  server.AuthenticationFilter (AuthenticationFilter.java:doFilter(549)) - AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
2024-09-26 12:31:48,993 ERROR webapp.TimelineWebServices (TimelineWebServices.java:postEntities(233)) - The owner of the posted timeline entities is not set
2024-09-26 12:31:54,146 WARN  server.AuthenticationFilter (AuthenticationFilter.java:doFilter(549)) - AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
2024-09-26 12:31:54,146 ERROR webapp.TimelineWebServices (TimelineWebServices.java:postEntities(233)) - The owner of the posted timeline entities is not set
2024-09-26 12:31:54,147 WARN  server.AuthenticationFilter (AuthenticationFilter.java:doFilter(549)) - AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
2024-09-26 12:31:54,147 ERROR webapp.TimelineWebServices (TimelineWebServices.java:postEntities(233)) - The owner of the posted timeline entities is not set
2024-09-26 12:31:54,358 INFO  timeline.RollingLevelDB (RollingLevelDB.java:roll(312)) - Rolling new DB instance for entity-ldb
polyrabbit commented 20 hours ago

不是在juicefs.access.log里面,这是io相关的记录。应该在 stdout/stderr 输出的日志里面,可以搜clean up stale session这个关键字。

ANHDY commented 18 hours ago

我在stdout/stderr 输出的日志里面没有找到clean up stale session相关的内容。但是我还是把timeline服务给关闭了。

然后程序就一直停在了INFO mapreduce.Job: map 67% reduce 0%这里,没有任何的输出内容,然后我查看yarn logs --applicationId application_日志内容显示:

2024-09-26 17:48:58,103 INFO [Socket Reader #1 for port 46389] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1727340021356_0003 (auth:SIMPLE)
2024-09-26 17:48:58,117 INFO [IPC Server handler 0 on 46389] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1727340021356_0003_m_8796093022210 asked for a task
2024-09-26 17:48:58,118 INFO [IPC Server handler 0 on 46389] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1727340021356_0003_m_8796093022210 given task: attempt_1727340021356_0003_m_000000_0
2024-09-26 17:49:22,835 INFO [IPC Server handler 3 on 46389] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1727340021356_0003_m_000000_0 is : 0.667
2024-09-26 17:49:34,855 INFO [IPC Server handler 2 on 46389] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1727340021356_0003_m_000000_0 is : 0.667
2024-09-26 17:49:46,867 INFO [IPC Server handler 1 on 46389] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1727340021356_0003_m_000000_0 is : 0.667
2024-09-26 17:49:58,881 INFO [IPC Server handler 6 on 46389] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1727340021356_0003_m_000000_0 is : 0.667
polyrabbit commented 15 hours ago

从最开始的日志来看,不像是客户端由于资源的不足而没有及时刷新session,其他节点有没有出现 stale session 的日志呢?如果有,那大概率还是有个节点不停的在踢人。

map 67% - jstack 看看线程阻塞在哪里了呢?

ANHDY commented 1 hour ago

从最开始的日志来看,不像是客户端由于资源的不足而没有及时刷新session,其他节点有没有出现 stale session 的日志呢?如果有,那大概率还是有个节点不停的在踢人。

map 67% - jstack 看看线程阻塞在哪里了呢?

在另外两个节点中分别出现了stale session的信息:

2024/09/27 09:38:23.672150 juicefs[4122655] <WARNING>: Session 894 was stale and cleaned up, but now it comes back again [tkv.go:557]
2024/09/27 09:49:50.028313 juicefs[4122655] <WARNING>: Session 894 was stale and cleaned up, but now it comes back again [tkv.go:557]
2024/09/27 10:36:18.103305 juicefs[4122655] <WARNING>: Session 894 was stale and cleaned up, but now it comes back again [tkv.go:557]
Sep 27, 2024 9:09:41 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
2024/09/27 10:05:51.352382 juicefs[68457] <WARNING>: Session 893 was stale and cleaned up, but now it comes back again [tkv.go:557]
2024/09/27 10:12:39.999600 juicefs[68457] <WARNING>: Session 893 was stale and cleaned up, but now it comes back again [tkv.go:557]
2024/09/27 10:21:56.748580 juicefs[68457] <WARNING>: Secret key is removed for the sake of safety [tkv.go:3184]

然后采用jstack查看了下map task所在节点的线程信息:

Attaching to process ID 4122655, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.112-b15
Deadlock Detection:

No deadlocks found.

Thread 4152502: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=215 (Compiled frame)
 - java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode, boolean, long) @bci=160, line=460 (Compiled frame)
 - java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object, boolean, long) @bci=102, line=362 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.poll(long, java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134, line=1066 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1127 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4141807: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=1027 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=57, line=1071 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4127583: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Interpreted frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=100, line=1088 (Interpreted frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=1, line=809 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=149, line=1067 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1127 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122713: (state = IN_JAVA)
Error occurred during stack walking:
java.lang.NullPointerException
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:88)
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
        at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
        at sun.tools.jstack.JStack.main(JStack.java:106)

Locked ownable synchronizers:
    - <0x00000007481899f0>, (a java/util/concurrent/ThreadPoolExecutor$Worker)

Thread 4122712: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run() @bci=47, line=1551 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122711: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.hdfs.PeerCache.run() @bci=41, line=253 (Interpreted frame)
 - org.apache.hadoop.hdfs.PeerCache.access$000(org.apache.hadoop.hdfs.PeerCache) @bci=1, line=46 (Interpreted frame)
 - org.apache.hadoop.hdfs.PeerCache$1.run() @bci=4, line=124 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122703: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - org.apache.hadoop.fs.TrashPolicyDefault$Emptier.run() @bci=27, line=246 (Interpreted frame)
 - io.juicefs.utils.BgTaskUtil$1.run() @bci=4, line=81 (Interpreted frame)
 - java.util.TimerThread.mainLoop() @bci=221, line=555 (Interpreted frame)
 - java.util.TimerThread.run() @bci=1, line=505 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122702: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=215 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=78, line=2078 (Compiled frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=124, line=1093 (Compiled frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=1, line=809 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=149, line=1067 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1127 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122701: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=143 (Compiled frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=164 (Compiled frame)
 - jnr.ffi.util.ref.internal.Finalizer.run() @bci=5, line=181 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122689: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.mapred.Task$TaskReporter.run() @bci=100, line=871 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122688: (state = IN_NATIVE)
 - org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, org.apache.hadoop.net.unix.DomainSocketWatcher$FdSet) @bci=0 (Interpreted frame)
 - org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, org.apache.hadoop.net.unix.DomainSocketWatcher$FdSet) @bci=2, line=52 (Interpreted frame)
 - org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() @bci=763, line=503 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122687: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=215 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=78, line=2078 (Compiled frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=124, line=1093 (Compiled frame)
 - java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=1, line=809 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=149, line=1067 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1127 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122686: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=143 (Compiled frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=164 (Compiled frame)
 - org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run() @bci=9, line=3855 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122683: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.TimerThread.mainLoop() @bci=201, line=552 (Interpreted frame)
 - java.util.TimerThread.run() @bci=1, line=505 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122682: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
 - org.apache.hadoop.metrics2.impl.SinkQueue.waitForData() @bci=13, line=114 (Compiled frame)
 - org.apache.hadoop.metrics2.impl.SinkQueue.consumeAll(org.apache.hadoop.metrics2.impl.SinkQueue$Consumer) @bci=1, line=83 (Compiled frame)
 - org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.publishMetricsFromQueue() @bci=46, line=135 (Interpreted frame)
 - org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1.run() @bci=4, line=89 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 4122668: (state = BLOCKED)

Locked ownable synchronizers:
    - None

Thread 4122667: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=143 (Compiled frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=164 (Compiled frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=36, line=209 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 4122666: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
 - java.lang.ref.Reference.tryHandlePending(boolean) @bci=54, line=191 (Compiled frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=1, line=153 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 4122656: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Interpreted frame)
 - java.util.concurrent.FutureTask.awaitDone(boolean, long) @bci=165, line=429 (Interpreted frame)
 - java.util.concurrent.FutureTask.get() @bci=13, line=191 (Interpreted frame)
 - io.juicefs.bench.IOMapperBase.map(org.apache.hadoop.io.Text, org.apache.hadoop.io.LongWritable, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) @bci=177, line=125 (Interpreted frame)
 - io.juicefs.bench.IOMapperBase.map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) @bci=12, line=41 (Interpreted frame)
 - org.apache.hadoop.mapred.MapRunner.run(org.apache.hadoop.mapred.RecordReader, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) @bci=39, line=54 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask.runOldMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=204, line=465 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=162, line=349 (Interpreted frame)
 - org.apache.hadoop.mapred.YarnChild$2.run() @bci=36, line=174 (Interpreted frame)
 - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
 - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
 - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1699 (Interpreted frame)
 - org.apache.hadoop.mapred.YarnChild.main(java.lang.String[]) @bci=533, line=168 (Interpreted frame)

Locked ownable synchronizers:
    - None