Closed ardaasik closed 3 years ago
Hi @ardaasik,
can you also share yarn logs for H2O-3 application?
yarn logs -applicationId application_1605118272173_0002
Hi @mn-mikke
Unfortunately, since application can't start yarn simply cant find any logs for that applicationid.
20/11/12 07:50:28 INFO client.RMProxy: Connecting to ResourceManager at node-master/172.24.0.4:8032
Unable to get ApplicationState. Attempting to fetch logs directly from the filesystem.
Can not find the appOwner. Please specify the correct appOwner
Could not locate application logs for application_1605118272173_0002
The Hadoop UI says Task failed task_1605163609543_0002_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0
as diagnostics for the application.
Using the Hadoop UI, I tracked down history of the application to one of the slave node. Below is the yarn logs for that slave node,
2020-11-12 06:48:04,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1605163609543_0002 transitioned from NEW to INITING
2020-11-12 06:48:04,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1605163609543_0002_01_000001 to application application_1605163609543_0002
2020-11-12 06:48:04,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1605163609543_0002 transitioned from INITING to RUNNING
2020-11-12 06:48:04,795 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000001 transitioned from NEW to LOCALIZING
2020-11-12 06:48:04,795 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1605163609543_0002
2020-11-12 06:48:04,795 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1605163609543_0002_01_000001
2020-11-12 06:48:04,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-root/nm-local-dir/nmPrivate/container_1605163609543_0002_01_000001.tokens
2020-11-12 06:48:04,797 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user root
2020-11-12 06:48:04,798 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-root/nm-local-dir/nmPrivate/container_1605163609543_0002_01_000001.tokens to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002/container_1605163609543_0002_01_000001.tokens
2020-11-12 06:48:04,798 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002 = file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002
2020-11-12 06:48:04,809 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2020-11-12 06:48:07,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000001 transitioned from LOCALIZING to SCHEDULED
2020-11-12 06:48:07,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1605163609543_0002_01_000001]
2020-11-12 06:48:07,046 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000001 transitioned from SCHEDULED to RUNNING
2020-11-12 06:48:07,046 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1605163609543_0002_01_000001
2020-11-12 06:48:07,048 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002/container_1605163609543_0002_01_000001/default_container_executor.sh]
2020-11-12 06:48:07,652 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1605163609543_0002_01_000001's ip = 172.24.0.2, and hostname = node-slave1
2020-11-12 06:48:07,654 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1605163609543_0002_01_000001 since CPU usage is not yet available.
2020-11-12 06:48:13,206 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1605163609543_0002_000001 (auth:SIMPLE)
2020-11-12 06:48:13,210 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1605163609543_0002_01_000002 by user root
2020-11-12 06:48:13,211 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=172.24.0.2 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000002
2020-11-12 06:48:13,211 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1605163609543_0002_01_000002 to application application_1605163609543_0002
2020-11-12 06:48:13,212 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000002 transitioned from NEW to LOCALIZING
2020-11-12 06:48:13,212 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1605163609543_0002
2020-11-12 06:48:13,212 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_INIT for appId application_1605163609543_0002
2020-11-12 06:48:13,212 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got APPLICATION_INIT for service mapreduce_shuffle
2020-11-12 06:48:13,215 INFO org.apache.hadoop.mapred.ShuffleHandler: Added token for job_1605163609543_0002
2020-11-12 06:48:13,215 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000002 transitioned from LOCALIZING to SCHEDULED
2020-11-12 06:48:13,215 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1605163609543_0002_01_000002]
2020-11-12 06:48:13,235 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000002 transitioned from SCHEDULED to RUNNING
2020-11-12 06:48:13,235 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1605163609543_0002_01_000002
2020-11-12 06:48:13,237 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002/container_1605163609543_0002_01_000002/default_container_executor.sh]
2020-11-12 06:48:13,682 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1605163609543_0002_01_000002's ip = 172.24.0.2, and hostname = node-slave1
2020-11-12 06:48:13,685 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1605163609543_0002_01_000002 since CPU usage is not yet available.
2020-11-12 06:48:14,315 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1605163609543_0002_01_000002 succeeded
2020-11-12 06:48:14,315 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS
2020-11-12 06:48:14,315 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1605163609543_0002_01_000002
2020-11-12 06:48:14,328 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000002
2020-11-12 06:48:14,329 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002/container_1605163609543_0002_01_000002
2020-11-12 06:48:14,330 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000002 transitioned from EXITED_WITH_SUCCESS to DONE
2020-11-12 06:48:14,330 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1605163609543_0002_01_000002 from application application_1605163609543_0002
2020-11-12 06:48:14,330 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1605163609543_0002_01_000002
2020-11-12 06:48:14,330 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1605163609543_0002
2020-11-12 06:48:15,080 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1605163609543_0002_000001 (auth:SIMPLE)
2020-11-12 06:48:15,085 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1605163609543_0002_01_000003 by user root
2020-11-12 06:48:15,088 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=172.24.0.2 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000003
2020-11-12 06:48:15,088 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1605163609543_0002_01_000003 to application application_1605163609543_0002
2020-11-12 06:48:15,090 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000003 transitioned from NEW to LOCALIZING
2020-11-12 06:48:15,090 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1605163609543_0002
2020-11-12 06:48:15,090 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_INIT for appId application_1605163609543_0002
2020-11-12 06:48:15,090 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got APPLICATION_INIT for service mapreduce_shuffle
2020-11-12 06:48:15,090 INFO org.apache.hadoop.mapred.ShuffleHandler: Added token for job_1605163609543_0002
2020-11-12 06:48:15,091 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000003 transitioned from LOCALIZING to SCHEDULED
2020-11-12 06:48:15,091 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1605163609543_0002_01_000003]
2020-11-12 06:48:15,100 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1605163609543_0002_000001 (auth:SIMPLE)
2020-11-12 06:48:15,106 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1605163609543_0002_01_000003
2020-11-12 06:48:15,106 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=172.24.0.2 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000003
2020-11-12 06:48:15,107 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000003 transitioned from SCHEDULED to KILLING
2020-11-12 06:48:15,107 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1605163609543_0002_01_000003
2020-11-12 06:48:15,107 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1605163609543_0002_01_000003 not launched. No cleanup needed to be done
2020-11-12 06:48:15,109 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1605163609543_0002_01_000003 not launched as it has already been marked for Killing
2020-11-12 06:48:15,110 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000003 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2020-11-12 06:48:15,110 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000003
2020-11-12 06:48:15,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000003 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2020-11-12 06:48:15,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1605163609543_0002_01_000003 from application application_1605163609543_0002
2020-11-12 06:48:15,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1605163609543_0002_01_000003
2020-11-12 06:48:15,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1605163609543_0002
2020-11-12 06:48:21,716 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1605163609543_0002_01_000001 succeeded
2020-11-12 06:48:21,716 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000001 transitioned from RUNNING to EXITED_WITH_SUCCESS
2020-11-12 06:48:21,716 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1605163609543_0002_01_000001
2020-11-12 06:48:21,727 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002/container_1605163609543_0002_01_000001
2020-11-12 06:48:21,727 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000001
2020-11-12 06:48:21,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1605163609543_0002_01_000001 transitioned from EXITED_WITH_SUCCESS to DONE
2020-11-12 06:48:21,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1605163609543_0002_01_000001 from application application_1605163609543_0002
2020-11-12 06:48:21,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1605163609543_0002_01_000001
2020-11-12 06:48:21,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1605163609543_0002
2020-11-12 06:48:21,744 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1605163609543_0002_000001 (auth:SIMPLE)
2020-11-12 06:48:21,747 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1605163609543_0002_01_000001
2020-11-12 06:48:21,748 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=172.24.0.4 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1605163609543_0002 CONTAINERID=container_1605163609543_0002_01_000001
2020-11-12 06:48:22,734 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1605163609543_0002_01_000002, container_1605163609543_0002_01_000001]
2020-11-12 06:48:22,739 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1605163609543_0002 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2020-11-12 06:48:22,740 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002
2020-11-12 06:48:22,740 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1605163609543_0002
2020-11-12 06:48:22,743 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1605163609543_0002 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2020-11-12 06:48:22,743 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1605163609543_0002, with delay of 10800 seconds
2020-11-12 06:56:52,204 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 406075283, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0
And here is the attempt logs for the container that belongs to the slave-node,
2020-11-12 06:48:07,924 INFO [main] org.apache.hadoop.security.SecurityUtil: Updating Configuration
2020-11-12 06:48:07,934 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2020-11-12 06:48:07,934 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 2 cluster_timestamp: 1605163609543 } attemptId: 1 } keyId: 2034004417)
2020-11-12 06:48:08,033 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2020-11-12 06:48:08,034 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2020-11-12 06:48:08,079 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.NullOutputFormat$2
2020-11-12 06:48:08,755 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2020-11-12 06:48:08,756 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2020-11-12 06:48:08,757 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2020-11-12 06:48:08,758 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2020-11-12 06:48:08,758 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2020-11-12 06:48:08,762 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2020-11-12 06:48:08,763 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2020-11-12 06:48:08,763 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2020-11-12 06:48:08,790 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-master:9000]
2020-11-12 06:48:08,804 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-master:9000]
2020-11-12 06:48:08,817 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://node-master:9000]
2020-11-12 06:48:08,823 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Creating intermediate history logDir: [hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate] + based on conf. Should ideally be created by the JobHistoryServer: yarn.app.mapreduce.am.create-intermediate-jh-base-dir
2020-11-12 06:48:08,839 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Perms after creating 493, Expected: 1023
2020-11-12 06:48:08,839 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Explicitly setting permissions to : 1023, rwxrwxrwt
2020-11-12 06:48:08,849 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Perms after creating 488, Expected: 504
2020-11-12 06:48:08,849 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Explicitly setting permissions to : 504, rwxrwx---
2020-11-12 06:48:08,851 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled
2020-11-12 06:48:08,888 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2020-11-12 06:48:08,971 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2020-11-12 06:48:09,044 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-11-12 06:48:09,044 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2020-11-12 06:48:09,051 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1605163609543_0002 to jobTokenSecretManager
2020-11-12 06:48:09,144 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1605163609543_0002 because: not enabled;
2020-11-12 06:48:09,159 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1605163609543_0002 = 0. Number of splits = 2
2020-11-12 06:48:09,159 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1605163609543_0002 = 0
2020-11-12 06:48:09,159 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1605163609543_0002Job Transitioned from NEW to INITED
2020-11-12 06:48:09,160 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1605163609543_0002.
2020-11-12 06:48:09,182 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2020-11-12 06:48:09,188 INFO [Socket Reader #1 for port 34399] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 34399
2020-11-12 06:48:09,220 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2020-11-12 06:48:09,225 INFO [IPC Server listener on 34399] org.apache.hadoop.ipc.Server: IPC Server listener on 34399: starting
2020-11-12 06:48:09,225 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2020-11-12 06:48:09,225 INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at node-slave1/172.24.0.2:34399
2020-11-12 06:48:09,285 INFO [main] org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2020-11-12 06:48:09,293 INFO [main] org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2020-11-12 06:48:09,296 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.mapreduce is not defined
2020-11-12 06:48:09,301 INFO [main] org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2020-11-12 06:48:09,323 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context mapreduce
2020-11-12 06:48:09,323 INFO [main] org.apache.hadoop.http.HttpServer2: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static
2020-11-12 06:48:09,326 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /mapreduce/*
2020-11-12 06:48:09,326 INFO [main] org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2020-11-12 06:48:09,605 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2020-11-12 06:48:09,607 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 46397
2020-11-12 06:48:09,607 INFO [main] org.mortbay.log: jetty-6.1.26
2020-11-12 06:48:09,702 INFO [main] org.mortbay.log: Extract jar:file:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.9.2.jar!/webapps/mapreduce to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1605163609543_0002/container_1605163609543_0002_01_000001/tmp/Jetty_0_0_0_0_46397_mapreduce____.wvohiu/webapp
2020-11-12 06:48:10,746 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:46397
2020-11-12 06:48:10,746 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app mapreduce started at 46397
2020-11-12 06:48:10,748 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: JOB_CREATE job_1605163609543_0002
2020-11-12 06:48:10,752 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 3000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2020-11-12 06:48:10,753 INFO [Socket Reader #1 for port 35185] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 35185
2020-11-12 06:48:10,758 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2020-11-12 06:48:10,759 INFO [IPC Server listener on 35185] org.apache.hadoop.ipc.Server: IPC Server listener on 35185: starting
2020-11-12 06:48:10,889 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2020-11-12 06:48:10,889 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2020-11-12 06:48:10,889 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2020-11-12 06:48:10,891 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 0% of the mappers will be scheduled using OPPORTUNISTIC containers
2020-11-12 06:48:10,915 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at node-master/172.24.0.4:8030
2020-11-12 06:48:10,973 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: maxContainerCapability: <memory:7000, vCores:1>
2020-11-12 06:48:10,973 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: queue: h2o_queue
2020-11-12 06:48:10,977 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2020-11-12 06:48:10,977 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: The thread pool initial size is 10
2020-11-12 06:48:10,986 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1605163609543_0002Job Transitioned from INITED to SETUP
2020-11-12 06:48:10,988 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2020-11-12 06:48:10,989 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1605163609543_0002Job Transitioned from SETUP to RUNNING
2020-11-12 06:48:11,014 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1605163609543_0002_m_000000 Task Transitioned from NEW to SCHEDULED
2020-11-12 06:48:11,014 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1605163609543_0002_m_000001 Task Transitioned from NEW to SCHEDULED
2020-11-12 06:48:11,016 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2020-11-12 06:48:11,016 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2020-11-12 06:48:11,017 INFO [Thread-54] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:1126, vCores:1>
2020-11-12 06:48:11,057 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1605163609543_0002, File: hdfs://node-master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job_1605163609543_0002_1.jhist
2020-11-12 06:48:11,976 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:2 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2020-11-12 06:48:12,003 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1605163609543_0002: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:1024, vCores:1> knownNMs=2
2020-11-12 06:48:13,032 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2020-11-12 06:48:13,038 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1605163609543_0002_01_000002 to attempt_1605163609543_0002_m_000000_0
2020-11-12 06:48:13,041 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2020-11-12 06:48:13,105 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-jar file on the remote FS is hdfs://node-master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job.jar
2020-11-12 06:48:13,108 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job.xml
2020-11-12 06:48:13,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #0 tokens and #1 secret keys for NM use for launching container
2020-11-12 06:48:13,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 1
2020-11-12 06:48:13,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData
2020-11-12 06:48:13,153 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2020-11-12 06:48:13,156 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1605163609543_0002_01_000002 taskAttempt attempt_1605163609543_0002_m_000000_0
2020-11-12 06:48:13,158 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1605163609543_0002_m_000000_0
2020-11-12 06:48:13,216 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1605163609543_0002_m_000000_0 : 13562
2020-11-12 06:48:13,217 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1605163609543_0002_m_000000_0] using containerId: [container_1605163609543_0002_01_000002 on NM: [node-slave1:43305]
2020-11-12 06:48:13,220 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2020-11-12 06:48:13,220 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1605163609543_0002_m_000000 Task Transitioned from SCHEDULED to RUNNING
2020-11-12 06:48:14,044 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1605163609543_0002: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=2
2020-11-12 06:48:14,188 INFO [Socket Reader #1 for port 35185] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1605163609543_0002 (auth:SIMPLE)
2020-11-12 06:48:14,201 INFO [IPC Server handler 0 on 35185] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 35185, call Call#0 Retry#0 getTask(org.apache.hadoop.mapred.JvmContext@57e283d0), rpc version=2, client version=21, methodsFingerPrint=-410993661 from 172.24.0.2:46564
org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol version mismatch. (client = 21, server = 19)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:502)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2020-11-12 06:48:15,052 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1605163609543_0002_01_000002
2020-11-12 06:48:15,052 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2020-11-12 06:48:15,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1605163609543_0002_m_000000_0:
2020-11-12 06:48:15,052 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1605163609543_0002_01_000003 to attempt_1605163609543_0002_m_000001_0
2020-11-12 06:48:15,052 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:2 ContRel:0 HostLocal:0 RackLocal:0
2020-11-12 06:48:15,063 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000000_0 transitioned from state RUNNING to FAILED, event type is TA_CONTAINER_COMPLETED and nodeId=node-slave1:43305
2020-11-12 06:48:15,064 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2020-11-12 06:48:15,064 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_COMPLETED for container container_1605163609543_0002_01_000002 taskAttempt attempt_1605163609543_0002_m_000000_0
2020-11-12 06:48:15,073 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1605163609543_0002_m_000000 Task Transitioned from RUNNING to FAILED
2020-11-12 06:48:15,073 INFO [Thread-54] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node node-slave1
2020-11-12 06:48:15,073 INFO [ContainerLauncher #2] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1605163609543_0002_01_000003 taskAttempt attempt_1605163609543_0002_m_000001_0
2020-11-12 06:48:15,074 INFO [ContainerLauncher #2] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1605163609543_0002_m_000001_0
2020-11-12 06:48:15,075 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2020-11-12 06:48:15,075 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job failed as tasks failed. failedMaps:1 failedReduces:0
2020-11-12 06:48:15,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1605163609543_0002Job Transitioned from RUNNING to FAIL_WAIT
2020-11-12 06:48:15,078 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1605163609543_0002_m_000001 Task Transitioned from SCHEDULED to KILL_WAIT
2020-11-12 06:48:15,078 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
2020-11-12 06:48:15,080 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1605163609543_0002_01_000003 taskAttempt attempt_1605163609543_0002_m_000001_0
2020-11-12 06:48:15,090 INFO [ContainerLauncher #2] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1605163609543_0002_m_000001_0 : 13562
2020-11-12 06:48:15,090 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1605163609543_0002_m_000001_0
2020-11-12 06:48:15,119 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
2020-11-12 06:48:15,122 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: TASK_ABORT
2020-11-12 06:48:15,122 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1605163609543_0002_m_000001_0] using containerId: [container_1605163609543_0002_01_000003 on NM: [node-slave1:43305]
2020-11-12 06:48:15,123 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1605163609543_0002_m_000001_0 TaskAttempt Transitioned from KILL_TASK_CLEANUP to KILLED
2020-11-12 06:48:15,123 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1605163609543_0002_m_000001 Task Transitioned from KILL_WAIT to KILLED
2020-11-12 06:48:15,124 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1605163609543_0002Job Transitioned from FAIL_WAIT to FAIL_ABORT
2020-11-12 06:48:15,124 INFO [CommitterEvent Processor #2] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_ABORT
2020-11-12 06:48:15,128 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1605163609543_0002Job Transitioned from FAIL_ABORT to FAILED
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, recording last MRAppMaster retry
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: true
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified that shouldUnregistered is: true
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is true
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
2020-11-12 06:48:15,129 INFO [Thread-67] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 0
2020-11-12 06:48:15,156 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://node-master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job_1605163609543_0002_1.jhist to hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002-1605163684511-root-H2O_via_SparklingWater_application_1605163609543_0-1605163695075-0-0-FAILED-h2o_queue-1605163690982.jhist_tmp
2020-11-12 06:48:15,189 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied from: hdfs://node-master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job_1605163609543_0002_1.jhist to done location: hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002-1605163684511-root-H2O_via_SparklingWater_application_1605163609543_0-1605163695075-0-0-FAILED-h2o_queue-1605163690982.jhist_tmp
2020-11-12 06:48:15,190 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Set historyUrl to http://node-slave1:19888/jobhistory/job/job_1605163609543_0002
2020-11-12 06:48:15,191 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://node-master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job_1605163609543_0002_1_conf.xml to hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002_conf.xml_tmp
2020-11-12 06:48:15,220 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied from: hdfs://node-master:9000/tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002/job_1605163609543_0002_1_conf.xml to done location: hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002_conf.xml_tmp
2020-11-12 06:48:15,232 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002.summary_tmp to hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002.summary
2020-11-12 06:48:15,234 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002_conf.xml_tmp to hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002_conf.xml
2020-11-12 06:48:15,236 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002-1605163684511-root-H2O_via_SparklingWater_application_1605163609543_0-1605163695075-0-0-FAILED-h2o_queue-1605163690982.jhist_tmp to hdfs://node-master:9000/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1605163609543_0002-1605163684511-root-H2O_via_SparklingWater_application_1605163609543_0-1605163695075-0-0-FAILED-h2o_queue-1605163690982.jhist
2020-11-12 06:48:15,237 INFO [Thread-67] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2020-11-12 06:48:15,238 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Task failed task_1605163609543_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
2020-11-12 06:48:15,238 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: History url is http://node-slave1:19888/jobhistory/job/job_1605163609543_0002
2020-11-12 06:48:15,248 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Waiting for application to be successfully unregistered.
2020-11-12 06:48:16,251 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:2 ContRel:0 HostLocal:0 RackLocal:0
2020-11-12 06:48:16,253 INFO [Thread-67] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://node-master:9000 /tmp/hadoop-yarn/staging/root/.staging/job_1605163609543_0002
2020-11-12 06:48:16,270 INFO [Thread-67] org.apache.hadoop.ipc.Server: Stopping server on 35185
2020-11-12 06:48:16,270 INFO [IPC Server listener on 35185] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 35185
2020-11-12 06:48:16,270 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-11-12 06:48:16,270 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
2020-11-12 06:48:16,271 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
2020-11-12 06:48:21,271 INFO [Thread-67] org.apache.hadoop.ipc.Server: Stopping server on 34399
2020-11-12 06:48:21,271 INFO [IPC Server listener on 34399] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 34399
2020-11-12 06:48:21,271 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-11-12 06:48:21,273 INFO [Thread-67] org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:0
Hi @ardaasik, Can you try to run plain H2O-3 on your cluster? http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html#walkthrough
Hi @mn-mikke, unfortunately that didn't do the trick. However I managed to solve the problem. The lack of error outputs and logs made me suspicious about how I set up the hadoop cluster. I found a tutorial, and made a fresh docker setup.
Apart from the tutorial,
Now, I don't know exactly which step(s) solved the problem but it was a misconfigured hadoop setup for sure.
Thanks for the assist.
I have 3 docker containers, 1 node-master and 2 slave-nodes which simulates a yarn cluster on a single machine.
Yarn is deployed in “cluster” mode and we are using h2o external backend using https://h2o-release.s3.amazonaws.com/h2o/rel-zermelo/1/h2o-3.32.0.1-hdp3.1.zip. Below are the configurations and the commands we use to try starting the H2Ocontext.
But I got
And here is the log files
Versions of the services I use,
I tried multiple yarn configurations and setup my own queue, but I got no luck. Here is some configurations that I think are important.
yarn-site.xml
mapred-site.xml
It might worth mentioning that I am running the code from jupyter notebook which is running on the same container as master-node.