linkedin / dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
BSD 2-Clause "Simplified" License
131 stars 34 forks source link

Dynamometer is incompatible with Hadoop 2.7.2 but this is not documented anywhere #99

Closed pingsutw closed 5 years ago

pingsutw commented 5 years ago

I got fail when I try to launch dyno-cluster.

[root@ftp0 hadoop]# start-dynamometer-cluster.sh -hadoop_binary_path hadoop-2.7.2.tar.gz -conf_path /root/hadoop/hadoop-2.7.2/etc/hadoop/conf -fs_image_dir hdfs:///dyno/fsimage -block_list_path hdfs:///dyno/blocks1

console log : 19/07/25 11:47:43 INFO dynamometer.Client: Running Client 19/07/25 11:47:43 INFO client.RMProxy: Connecting to ResourceManager at ftp0/192.168.103.159:8032 19/07/25 11:47:43 INFO dynamometer.Client: Got Cluster metric info from ASM, numNodeManagers=3 19/07/25 11:47:43 INFO dynamometer.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0 19/07/25 11:47:43 INFO dynamometer.Client: Max mem capabililty of resources in this cluster 9000 19/07/25 11:47:43 INFO dynamometer.Client: Max virtual cores capabililty of resources in this cluster 50 19/07/25 11:47:43 INFO dynamometer.Client: Set the environment for the application master 19/07/25 11:47:43 INFO dynamometer.Client: Using resource FS_IMAGE directly from current location: hdfs://ftp0:9000/dyno/fsimage/fsimage_0000000000000108883 19/07/25 11:47:43 INFO dynamometer.Client: Using resource FS_IMAGE_MD5 directly from current location: hdfs://ftp0:9000/dyno/fsimage/fsimage_0000000000000108883.md5 19/07/25 11:47:43 INFO dynamometer.Client: Using resource VERSION directly from current location: hdfs:/dyno/fsimage/VERSION 19/07/25 11:47:43 INFO dynamometer.Client: Uploading resource CONF_ZIP from [/root/hadoop/hadoop-2.7.2/etc/hadoop/conf] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/conf.zip 19/07/25 11:47:44 INFO dynamometer.Client: Uploading resource START_SCRIPT from [file:/tmp/hadoop-unjar5145675343523534600/start-component.sh] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/start-component.sh 19/07/25 11:47:44 INFO dynamometer.Client: Uploading resource HADOOP_BINARY from [hadoop-2.7.2.tar.gz] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/hadoop-2.7.2.tar.gz 19/07/25 11:47:44 INFO dynamometer.Client: Uploading resource DYNO_DEPS from [/root/dynamometer/build/distributions/dynamometer-0.1.7/bin/../lib/dynamometer-infra-0.1.7.jar] to hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001/dependencies.zip 19/07/25 11:47:44 INFO dynamometer.Client: Completed setting up app master command: [$JAVA_HOME/bin/java, -Xmx1741m, com.linkedin.dynamometer.ApplicationMaster, --datanode_memory_mb 2048, --datanode_vcores 1, --datanodes_per_cluster 1, --datanode_launch_delay 0s, --namenode_memory_mb 2048, --namenode_vcores 1, --namenode_metrics_period 60, 1>/stdout, 2>/stderr] 19/07/25 11:47:44 INFO dynamometer.Client: Submitting application to RM 19/07/25 11:47:44 INFO impl.YarnClientImpl: Submitted application application_1564026451259_0001 19/07/25 11:47:45 INFO dynamometer.Client: Track the application at: http://ftp0:8088/proxy/application_1564026451259_0001/ 19/07/25 11:47:45 INFO dynamometer.Client: Kill the application using: yarn application -kill application_1564026451259_0001 19/07/25 11:48:00 INFO dynamometer.Client: NameNode can be reached via HDFS at: hdfs://ftp1:9002/ 19/07/25 11:48:00 INFO dynamometer.Client: NameNode web UI available at: http://ftp1:50077/ 19/07/25 11:48:00 INFO dynamometer.Client: NameNode can be tracked at: http://ftp1:8042/node/containerlogs/container_1564026451259_0001_01_000002/root/ 19/07/25 11:48:00 INFO dynamometer.Client: Waiting for NameNode to finish starting up... 19/07/25 11:48:07 INFO dynamometer.Client: Infra app exited unexpectedly. YarnState=FINISHED. Exiting from client. 19/07/25 11:48:07 INFO dynamometer.Client: Attempting to clean up remaining running applications. 19/07/25 11:48:07 ERROR dynamometer.Client: Application failed to complete successfully

After that, I go to see container log under Hadoop. [root@ftp0 container_1564026451259_0001_01_000001]# pwd /root/hadoop/hadoop-2.7.2/logs/userlogs/application_1564026451259_0001/container_1564026451259_0001_01_000001 [root@ftp0 container_1564026451259_0001_01_000001]# ls stderr stdout

stdout is empty !!

stderr :

19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: Setting up container launch context for containerid=container_1564026451259_0001_01_000002, isNameNode=true 19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: Completed setting up command for namenode: [./start-component.sh, namenode, hdfs://ftp0:9000/user/root/.dynamometer/application_1564026451259_0001, 1>/stdout, 2>/stderr] 19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: Starting NAMENODE; track at: http://ftp1:8042/node/containerlogs/container_1564026451259_0001_01_000002/root/ 19/07/25 11:47:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1564026451259_0001_01_000002 19/07/25 11:47:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ftp1:34334 19/07/25 11:47:51 INFO dynamometer.ApplicationMaster: NameNode container started at ID container_1564026451259_0001_01_000002 19/07/25 11:48:00 INFO dynamometer.ApplicationMaster: NameNode information: {NM_HTTP_PORT=8042, NN_HOSTNAME=ftp1, NN_HTTP_PORT=50077, NN_SERVICERPC_PORT=9022, NN_RPC_PORT=9002, CONTAINER_ID=container_1564026451259_0001_01_000002} 19/07/25 11:48:00 INFO dynamometer.ApplicationMaster: NameNode can be reached at: hdfs://ftp1:9002/ 19/07/25 11:48:00 INFO dynamometer.ApplicationMaster: Waiting for NameNode to finish starting up... 19/07/25 11:48:05 INFO dynamometer.ApplicationMaster: Got response from RM for container ask, completedCnt=1 19/07/25 11:48:05 INFO dynamometer.ApplicationMaster: Got container status for NAMENODE: containerID=container_1564026451259_0001_01_000002, state=COMPLETE, exitStatus=1, diagnostics=Exception from container-launch. Container id: container_1564026451259_0001_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zer... 19/07/25 11:48:05 INFO dynamometer.ApplicationMaster: NameNode container completed; marking application as done 19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: NameNode has started! 19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Looking for block listing files in hdfs:/dyno/blocks1 19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Requesting 2 DataNode containers with 2048MB memory, 1 vcores, 19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Finished requesting datanode containers 19/07/25 11:48:06 INFO dynamometer.ApplicationMaster: Application completed. Stopping running containers 19/07/25 11:48:06 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ftp1:34334 19/07/25 11:48:07 INFO dynamometer.ApplicationMaster: Application completed. Signalling finish to RM 19/07/25 11:48:07 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 19/07/25 11:48:07 INFO dynamometer.ApplicationMaster: Application Master failed. exiting 19/07/25 11:48:07 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)

Thanks in advanced !!

pingsutw commented 5 years ago

Due to some compatible issue I change Hadoop version to 2.7.4 it works. I think we should add a list, so that user could know which Hadoop version works

xkrogen commented 5 years ago

Ah, yes, it doesn't work well on 2.7.0 ~ 2.7.3. You're absolutely right that there should be a list. I will put together a PR soon.

pingsutw commented 5 years ago

@xkrogen Thanks for your hlep

xkrogen commented 5 years ago

@pingsutw if you don't mind I'm going to keep this open until I can get out a PR to fix the documentation

xkrogen commented 5 years ago

Hi @pingsutw , I just submitted a PR at #103, let me know what you think.

pingsutw commented 5 years ago

@xkrogen awesome work !! Just clone the branch, and It works well for me I'm doing HA support for dynamometer in Hadoop 3 After I finish, I will submit a PR based on this branch as well