juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.93k stars 965 forks source link

Container exited with a non-zero exit code 134. Error file: prelaunch.err. #5183

Closed ANHDY closed 1 month ago

ANHDY commented 1 month ago

What happened:

This error occurred during distributed testing. Execute command:hadoop jar ./juicefs-hadoop-1.2.0.jar dfsio -write -files 50 -size 1MB -bufferSize 1048576 -baseDir jfs://jfs/tmp/benchmarks/DFSIO

log:

24/09/23 15:29:50 INFO mapreduce.Job: Running job: job_1727071698603_0010
24/09/23 15:30:00 WARN ipc.Client: Exception encountered while connecting to the server : java.io.IOException: Connection reset by peer
24/09/23 15:30:08 INFO mapreduce.Job: Job job_1727071698603_0010 running in uber mode : false
24/09/23 15:30:08 INFO mapreduce.Job:  map 0% reduce 0%
24/09/23 15:30:08 INFO mapreduce.Job: Job job_1727071698603_0010 failed with state FAILED due to: Application application_1727071698603_0010 failed 2 times due to AM Container for appattempt_1727071698603_0010_000002 exited with  exitCode: 134
Failing this attempt.Diagnostics: [2024-09-23 15:30:22.284]Exception from container-launch.
Container id: container_e64_1727071698603_0010_02_000001
Exit code: 134
Shell output: main : command provided 1
main : run as user is hive
main : requested yarn user is hive
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /data1/hadoop/yarn/local/nmPrivate/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/container_e64_1727071698603_0010_02_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...

[2024-09-23 15:30:22.306]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 4071550 Aborted                 (core dumped) /usr/jdk64/jdk1.8.0_112/bin/java -Djava.io.tmpdir=/data1/hadoop/yarn/local/usercache/hive/appcache/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/hadoop/yarn/log/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx2048m -Xmx9011m org.apache.hadoop.mapreduce.v2.app.MRAppMaster > /hadoop/yarn/log/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/stdout 2> /hadoop/yarn/log/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/stderr
Last 4096 bytes of stderr :
2024/09/23 15:30:19.463841 juicefs[4071550] <WARNING>: statfs /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/: no such file or directory [utils_unix.go:39]
2024/09/23 15:30:19.463919 juicefs[4071550] <PANIC>: create lock file /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/.lock: open lock file /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/.lock: open /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/.lock: no such file or directory [disk_cache.go:187]
panic: (*logrus.Entry) 0xc0010a33b0

goroutine 17 [running, locked to thread]:
github.com/sirupsen/logrus.(*Entry).log(0xc0010a3260, 0x0, {0xc000e0c5a0, 0xe3})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491
github.com/sirupsen/logrus.(*Entry).Log(0xc0010a3260, 0x0, {0xc000677250?, 0x2?, 0x2?})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Logf(0xc0010a3260, 0x0, {0x7f76f0524a82?, 0xc0006772b0?}, {0xc000677320?, 0x10?, 0x7f76f160a420?})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:349 +0x7c
github.com/sirupsen/logrus.(*Logger).Logf(0xc000b21ad0, 0x0, {0x7f76f0524a82, 0x17}, {0xc000677320, 0x2, 0x2})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/logger.go:154 +0x7c
github.com/sirupsen/logrus.(*Logger).Panicf(...)
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/logger.go:195
github.com/juicedata/juicefs/pkg/chunk.(*cacheStore).createLockFile(0xc000de4540)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/disk_cache.go:187 +0x185
github.com/juicedata/juicefs/pkg/chunk.newCacheStore(0xc001846000, {0xc000c48c30, 0x30}, 0x40000000, 0x7f76f1609da0?, 0xc000e494a0, 0xc0017fd600)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/disk_cache.go:146 +0x76f
github.com/juicedata/juicefs/pkg/chunk.newCacheManager(0xc000e494a0, {0x0, 0x0}, 0x7f76ee2767e5?)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/disk_cache.go:1051 +0x56b
github.com/juicedata/juicefs/pkg/chunk.NewCachedStore({_, _}, {{0xc000c48bd0, 0x2f}, 0x1a4, 0x40000000, {0xc0010a7018, 0x4}, {0xc0010a7030, 0x8}, ...}, ...)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/cached_store.go:789 +0x585
main.jfs_init.func1()
        /go/src/github.com/juicedata/juicefs/sdk/java/libjfs/main.go:544 +0xc9c
main.getOrCreate({0xc0010a6010, 0x9}, {0xc0002321c0, 0x4}, {0xc0002321c4, 0x6}, {0xc0002321cc, 0x4}, {0xc000232260, 0x4}, ...)
        /go/src/github.com/juicedata/juicefs/sdk/java/libjfs/main.go:332 +0x16f
main.jfs_init(0x7f76f0350519?, 0x7f772afee690, 0x7f76ee1b99b2?, 0xc000202000?, 0x0?, 0xc000006601?)
        /go/src/github.com/juicedata/juicefs/sdk/java/libjfs/main.go:414 +0x198

[2024-09-23 15:30:22.307]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 4071550 Aborted                 (core dumped) /usr/jdk64/jdk1.8.0_112/bin/java -Djava.io.tmpdir=/data1/hadoop/yarn/local/usercache/hive/appcache/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/hadoop/yarn/log/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx2048m -Xmx9011m org.apache.hadoop.mapreduce.v2.app.MRAppMaster > /hadoop/yarn/log/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/stdout 2> /hadoop/yarn/log/application_1727071698603_0010/container_e64_1727071698603_0010_02_000001/stderr
Last 4096 bytes of stderr :
2024/09/23 15:30:19.463841 juicefs[4071550] <WARNING>: statfs /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/: no such file or directory [utils_unix.go:39]
2024/09/23 15:30:19.463919 juicefs[4071550] <PANIC>: create lock file /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/.lock: open lock file /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/.lock: open /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/.lock: no such file or directory [disk_cache.go:187]
panic: (*logrus.Entry) 0xc0010a33b0

goroutine 17 [running, locked to thread]:
github.com/sirupsen/logrus.(*Entry).log(0xc0010a3260, 0x0, {0xc000e0c5a0, 0xe3})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491
github.com/sirupsen/logrus.(*Entry).Log(0xc0010a3260, 0x0, {0xc000677250?, 0x2?, 0x2?})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Logf(0xc0010a3260, 0x0, {0x7f76f0524a82?, 0xc0006772b0?}, {0xc000677320?, 0x10?, 0x7f76f160a420?})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:349 +0x7c
github.com/sirupsen/logrus.(*Logger).Logf(0xc000b21ad0, 0x0, {0x7f76f0524a82, 0x17}, {0xc000677320, 0x2, 0x2})
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/logger.go:154 +0x7c
github.com/sirupsen/logrus.(*Logger).Panicf(...)
        /root/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/logger.go:195
github.com/juicedata/juicefs/pkg/chunk.(*cacheStore).createLockFile(0xc000de4540)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/disk_cache.go:187 +0x185
github.com/juicedata/juicefs/pkg/chunk.newCacheStore(0xc001846000, {0xc000c48c30, 0x30}, 0x40000000, 0x7f76f1609da0?, 0xc000e494a0, 0xc0017fd600)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/disk_cache.go:146 +0x76f
github.com/juicedata/juicefs/pkg/chunk.newCacheManager(0xc000e494a0, {0x0, 0x0}, 0x7f76ee2767e5?)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/disk_cache.go:1051 +0x56b
github.com/juicedata/juicefs/pkg/chunk.NewCachedStore({_, _}, {{0xc000c48bd0, 0x2f}, 0x1a4, 0x40000000, {0xc0010a7018, 0x4}, {0xc0010a7030, 0x8}, ...}, ...)
        /go/src/github.com/juicedata/juicefs/pkg/chunk/cached_store.go:789 +0x585
main.jfs_init.func1()
        /go/src/github.com/juicedata/juicefs/sdk/java/libjfs/main.go:544 +0xc9c
main.getOrCreate({0xc0010a6010, 0x9}, {0xc0002321c0, 0x4}, {0xc0002321c4, 0x6}, {0xc0002321cc, 0x4}, {0xc000232260, 0x4}, ...)
        /go/src/github.com/juicedata/juicefs/sdk/java/libjfs/main.go:332 +0x16f
main.jfs_init(0x7f76f0350519?, 0x7f772afee690, 0x7f76ee1b99b2?, 0xc000202000?, 0x0?, 0xc000006601?)
        /go/src/github.com/juicedata/juicefs/sdk/java/libjfs/main.go:414 +0x198

For more detailed output, check the application tracking page: http://emr-master-001.novalocal:8088/cluster/app/application_1727071698603_0010 Then click on links to logs of each attempt.
. Failing the application.
24/09/23 15:30:08 INFO mapreduce.Job: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:876)
        at io.juicefs.bench.TestDFSIO.runIOTest(TestDFSIO.java:599)
        at io.juicefs.bench.TestDFSIO.writeTest(TestDFSIO.java:578)
        at io.juicefs.bench.TestDFSIO.run(TestDFSIO.java:337)
        at io.juicefs.Main.main(Main.java:327)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:222)

View yarn logs:yarn logs --applicationId=application_1727071698603_0010

************************************************************/
2024-09-23 15:29:53,869 INFO [main] org.apache.hadoop.security.SecurityUtil: Updating Configuration
2024-09-23 15:29:53,930 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens: [Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 10 cluster_timestamp: 1727071698603 } attemptId: 1 } keyId: -1814199453), Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cmss, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/emr-master-001.novalocal@BCHKDC, renewer=yarn, realUser=, issueDate=1727076589228, maxDate=1727681389228, sequenceNumber=333, masterKeyId=81), Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.5.245:8188, Ident: (TIMELINE_DELEGATION_TOKEN owner=hive, renewer=yarn, realUser=, issueDate=1727076590138, maxDate=1727681390138, sequenceNumber=179, masterKeyId=77)]
2024-09-23 15:29:53,952 INFO [main] org.apache.hadoop.conf.Configuration: found resource resource-types.xml at file:/usr/bch/3.3.0/hadoop/etc/hadoop/resource-types.xml
2024-09-23 15:29:54,061 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2024-09-23 15:29:54,062 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
2024-09-23 15:29:54,465 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2024-09-23 15:29:54,521 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2024-09-23 15:29:54,522 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2024-09-23 15:29:54,522 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2024-09-23 15:29:54,522 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2024-09-23 15:29:54,523 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2024-09-23 15:29:54,523 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2024-09-23 15:29:54,523 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2024-09-23 15:29:54,524 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2024-09-23 15:29:54,545 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://cmss:8020]
2024-09-23 15:29:54,556 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://cmss:8020]
2024-09-23 15:29:54,566 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://cmss:8020]
2024-09-23 15:29:54,571 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2024-09-23 15:29:54,577 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline service is enabled
2024-09-23 15:29:54,577 INFO [main] org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl: Timeline service address: null
2024-09-23 15:29:54,720 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Timeline service is enabled; version: 1.0
2024-09-23 15:29:54,746 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2024-09-23 15:29:54,881 INFO [main] org.apache.commons.beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2024-09-23 15:29:54,905 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2024-09-23 15:29:54,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2024-09-23 15:29:54,942 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2024-09-23 15:29:54,948 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1727071698603_0010 to jobTokenSecretManager
2024-09-23 15:29:55,060 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1727071698603_0010 because: not enabled; too much RAM;
2024-09-23 15:29:55,077 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1727071698603_0010 = 112. Number of splits = 1
2024-09-23 15:29:55,077 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1727071698603_0010 = 1
2024-09-23 15:29:55,077 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1727071698603_0010Job Transitioned from NEW to INITED
2024-09-23 15:29:55,078 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1727071698603_0010.
2024-09-23 15:29:55,095 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2024-09-23 15:29:55,099 INFO [Socket Reader #1 for port 41991] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 41991
2024-09-23 15:29:55,116 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2024-09-23 15:29:55,129 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2024-09-23 15:29:55,129 INFO [IPC Server listener on 41991] org.apache.hadoop.ipc.Server: IPC Server listener on 41991: starting
2024-09-23 15:29:55,130 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at emr-task-3df4-013.novalocal/192.168.5.100:41991
2024-09-23 15:29:55,151 INFO [main] org.eclipse.jetty.util.log: Logging initialized @1925ms to org.eclipse.jetty.util.log.Slf4jLog
2024-09-23 15:29:55,212 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined
2024-09-23 15:29:55,214 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2024-09-23 15:29:55,240 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce
2024-09-23 15:29:55,240 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static
2024-09-23 15:29:55,241 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/*
2024-09-23 15:29:55,241 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2024-09-23 15:29:55,436 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2024-09-23 15:29:55,437 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 32823
2024-09-23 15:29:55,438 INFO [main] org.eclipse.jetty.server.Server: jetty-9.4.48.v20220622; built: 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 1.8.0_112-b15
2024-09-23 15:29:55,457 INFO [main] org.eclipse.jetty.server.session: DefaultSessionIdManager workerName=node0
2024-09-23 15:29:55,457 INFO [main] org.eclipse.jetty.server.session: No SessionScavenger set, using defaults
2024-09-23 15:29:55,458 INFO [main] org.eclipse.jetty.server.session: node0 Scavenging every 600000ms
2024-09-23 15:29:55,466 INFO [main] org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@7a485a36{static,/static,jar:file:/usr/bch/3.3.0/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.1.0-bc3.3.0.jar!/webapps/static,AVAILABLE}
2024-09-23 15:29:55,881 INFO [main] org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@66d25ba9{mapreduce,/,file:///data1/hadoop/yarn/local/usercache/hive/appcache/application_1727071698603_0010/container_e64_1727071698603_0010_01_000001/tmp/jetty-0_0_0_0-32823-hadoop-yarn-common-3_1_0-bc3_3_0_jar-_-any-7661548558203930614/webapp/,AVAILABLE}{jar:file:/usr/bch/3.3.0/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.1.0-bc3.3.0.jar!/webapps/mapreduce}
2024-09-23 15:29:55,887 INFO [main] org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@4c2af006{HTTP/1.1, (http/1.1)}{0.0.0.0:32823}
2024-09-23 15:29:55,887 INFO [main] org.eclipse.jetty.server.Server: Started @2661ms
2024-09-23 15:29:55,887 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app mapreduce started at 32823
2024-09-23 15:29:55,890 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 3000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2024-09-23 15:29:55,890 INFO [Socket Reader #1 for port 37529] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 37529
2024-09-23 15:29:55,904 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2024-09-23 15:29:55,904 INFO [IPC Server listener on 37529] org.apache.hadoop.ipc.Server: IPC Server listener on 37529: starting
2024-09-23 15:29:55,922 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2024-09-23 15:29:55,922 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2024-09-23 15:29:55,923 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2024-09-23 15:29:55,924 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 0% of the mappers will be scheduled using OPPORTUNISTIC containers
2024-09-23 15:29:55,987 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: maxContainerCapability: <memory:101376, vCores:28>
2024-09-23 15:29:55,987 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: queue: root.ambari-qa
2024-09-23 15:29:55,990 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2024-09-23 15:29:55,990 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: The thread pool initial size is 10
2024-09-23 15:29:55,995 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1727071698603_0010Job Transitioned from INITED to SETUP
2024-09-23 15:29:55,997 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2024-09-23 15:29:56,000 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-09-23 15:29:56,000 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-09-23 15:29:56,045 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1727071698603_0010, File: hdfs://cmss:8020/user/hive/.staging/job_1727071698603_0010/job_1727071698603_0010_1.jhist
2024-09-23 15:29:56,308 WARN [Thread-78] io.juicefs.JuiceFileSystemImpl: 2024/09/23 15:29:56.308153 juicefs[4059305] <WARNING>: statfs /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/: no such file or directory [utils_unix.go:39]

End of LogType:syslog

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?

Environment:

zhijian-pro commented 1 month ago

Is this stable and reproducible? The cause of this painc is that the cache directory was accidentally deleted, causing the cache directory to no longer exist when the lock file was read. /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/ Why doesn't it exist? Did you manually delete that?

ANHDY commented 1 month ago

Is this stable and reproducible? The cause of this painc is that the cache directory was accidentally deleted, causing the cache directory to no longer exist when the lock file was read. /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/ Why doesn't it exist? Did you manually delete that?

Yes, this problem occurs every time, /data1/jfs/83f08d30-0069-4d18-b0d9-4650f31a8704/ I used the command to check this folder and it exists. I don't know why I still report this error every time.

tangyoupeng commented 1 month ago

Which user did you use to run hadoop command? And did the user has the permission to mkdir under /data1/jfs

ANHDY commented 1 month ago

Which user did you use to run hadoop command? And did the user has the permission to mkdir under /data1/jfs

I used the hive user to run Hadoop commands and set 777 permissions for /data1/jfs