linkedin / dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
BSD 2-Clause "Simplified" License
131 stars 34 forks source link

Miss ALL blocks #71

Closed seanshaogh closed 5 years ago

seanshaogh commented 5 years ago

To xkrogen , Good afternoon! The NameNode would miss all blocks and none DataNode was registered when Manual Workload Launch. These commands was used: 1.Execute the Block Generation Job: ./generate-block-lists.sh -fsimage_input_path hdfs://cluster/user/qa/dyno/fsimage/fsimage_0000000000282000135.xml -block_image_output_dir hdfs://cluster/user/qa/dyno/blocks -num_reducers 1 -num_datanodes 1

2.Manual Workload Launch: ./start-dynamometer-cluster.sh --hadoop_binary_path hadoop-2.7.3-1.2.7.tar.gz --conf_path /home/hdfs/Dynamometer/dynamometer-0.1.0-SNAPSHOT/bin/hadoop --fs_image_dir hdfs://cluster/user/qa/dyno/fsimage --block_list_path hdfs://cluster/user/qa/dyno/blocks

seanshaogh commented 5 years ago

The NameNode UI showed :There are 100 missing blocks. The following files may be corrupted.

xkrogen commented 5 years ago

Hi @seanshaogh, thanks for reporting this issue! The steps you've used to launch it seem correct. Can you provide some more information about what happened:

seanshaogh commented 5 years ago

Hi @xkrogen ,thanks for your reply! I tried to launch one datanode in container and found none datanode registered in namenode. The AM logs showed DataNode process itself successfully launched within the container. It looks like the datanode could not connected to namenode . The AM logs show as below:

The DataNode logs:

Starting datanode with ID 000003 PWD is: /mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 Saving original HADOOP_HOME as: /usr/ndp/current/yarn_nodemanager Saving original HADOOP_CONF_DIR as: /usr/ndp/current/yarn_nodemanager/conf Environment variables are set as: (note that this doesn't include changes made by hadoop-env.sh) XDG_SESSION_ID=c797411 YARN_RESOURCEMANAGER_OPTS= -Drm.audit.logger=INFO,RMAUDIT -Drm.audit.logger=INFO,RMAUDIT HADOOP_LOG_DIR=/mnt/dfs/0/hadoop/yarn/log/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 HADOOP_IDENT_STRING=yarn SHELL=/bin/bash HADOOP_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 NM_HOST=hadoop YARN_PID_DIR=/var/run/ndp/hadoop-yarn/yarn HADOOP_PID_DIR=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node/pid NN_EDITS_DIR= HADOOP_PREFIX=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 YARN_NICENESS=0 NM_AUX_SERVICE_mapreduce_shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=

QTDIR=/usr/lib64/qt-3.3 NN_ADDITIONAL_ARGS= NM_HTTP_PORT=8042 QTINC=/usr/lib64/qt-3.3/include QT_GRAPHICSSYSTEM_CHECKED=1 LOCAL_DIRS=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446 USER=qa JAVA_LIBRARY_PATH=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir HADOOP_HEAPSIZE= HADOOP_TOKEN_FILE_LOCATION=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/container_tokens HADOOP_LIBEXEC_DIR=/usr/ndp/current/yarn_nodemanager/libexec LOG_DIRS=/mnt/dfs/0/hadoop/yarn/log/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 MALLOC_ARENA_MAX=4 YARN_NODEMANAGER_OPTS= -Dnm.audit.logger=INFO,NMAUDIT -Dnm.audit.logger=INFO,NMAUDIT YARN_ROOT_LOGGER=INFO,EWMA,RFA PATH=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7/bin:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent HADOOP_HDFS_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 YARN_IDENT_STRING=yarn HADOOP_COMMON_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 PWD=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 JAVA_HOME=/usr/jdk64/jdk1.8.0_152 NN_NAME_DIR= HADOOP_YARN_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 HADOOP_CLASSPATH=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/additionalClasspath/ LANG=en_US.UTF-8 HADOOP_CONF_DIR=/etc/hdfs/hdfs_namenode/2.7.3/0 HADOOP_OPTS=-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/ndp/hadoop-hdfs/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/ndp/hadoop-hdfs/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true YARN_TIMELINESERVER_HEAPSIZE=1024 YARN_LOG_DIR=/var/log/ndp/hadoop-yarn/yarn_nodemanager LIBHDFS_OPTS=-Djava.library.path=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7/lib/native HOME=/home/ SHLVL=4 DN_ADDITIONAL_ARGS= YARN_LOGFILE=yarn-yarn-nodemanager-hadoop.log YARN_CONF_DIR=/etc/mapreduce2/mapreduce_client/2.7.3/0 JVM_PID=8092 YARN_NODEMANAGER_HEAPSIZE=4096 HADOOP_MAPRED_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR NM_PORT=45454 LOGNAME=qa QTLIB=/usr/lib64/qt-3.3/lib NM_AUX_SERVICE_spark_shuffle= HADOOP_HOME_WARN_SUPPRESS=1 CONTAINER_ID=container_e105_1545030638014_43446_01_000003 LESSOPEN=||/usr/bin/lesspipe.sh %s NN_FILE_METRIC_PERIOD=60 HADOOP_ROOT_LOGGER=INFO,RFA XDG_RUNTIME_DIR=/run/user/5012 YARN_RESOURCEMANAGER_HEAPSIZE=6144 HADOOP_YARNUSER=yarn =/usr/bin/printenv

Going to sleep for 0 sec... Executing the following: /mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7/bin/hadoop jar dynamometer.jar com.linkedin.dynamometer.SimulatedDataNodes -D fs.defaultFS=hdfs://hadoop1:9022/ -D dfs.datanode.hostname=hadoop -D dfs.datanode.data.dir=file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//hadoop/hdfs/data,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/0,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/1,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/2,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/3,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/4,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/5,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/6,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/7,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/8,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/9,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/10,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/11 -D dfs.datanode.ipc.address=0.0.0.0:0 -D dfs.datanode.http.address=0.0.0.0:0 -D dfs.datanode.address=0.0.0.0:0 -D dfs.datanode.directoryscan.interval=-1 -D fs.du.interval=43200000 -D fs.getspaceused.jitterMillis=21600000 -D hadoop.tmp.dir=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node -D hadoop.security.authentication=simple -D hadoop.security.authorization=false -D dfs.http.policy=HTTP_ONLY -D dfs.nameservices= -D dfs.web.authentication.kerberos.principal= -D dfs.web.authentication.kerberos.keytab= -D hadoop.http.filter.initializers= -D dfs.datanode.kerberos.principal= -D dfs.datanode.keytab.file= -D dfs.domain.socket.path= -D dfs.client.read.shortcircuit=false BP-555526057-yarn-1534758010800 file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/blocks/block0 Started datanode at pid 8219 Waiting for parent process (PID: 8092) OR datanode process to exit DataNodes will connect to NameNode at hadoop1:9022 Found 1 block listing files; launching DataNodes accordingly. Waiting for DataNodes to connect to NameNode and init storage directories.

The Namenode logs:

2018-12-18 14:38:09,654 [0] - INFO [main:ApplicationMaster@164] - Initializing ApplicationMaster 2018-12-18 14:38:09,981 [327] - INFO [main:ApplicationMaster@229] - Application master for app, appId=43446, clustertimestamp=1545030638014, attemptId=1 2018-12-18 14:38:09,981 [327] - INFO [main:ApplicationMaster@258] - Starting ApplicationMaster 2018-12-18 14:38:10,103 [449] - WARN [main:NativeCodeLoader@62] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-12-18 14:38:10,304 [650] - INFO [main:NMClientAsyncImpl@107] - Upper bound of the thread pool size is 500 2018-12-18 14:38:10,306 [652] - INFO [main:ContainerManagementProtocolProxy@81] - yarn.client.max-cached-nodemanagers-proxies : 0 2018-12-18 14:38:10,510 [856] - INFO [main:ApplicationMaster@300] - Requested NameNode ask: Capability[<memory:2048, vCores:1>]Priority[0] 2018-12-18 14:38:10,518 [864] - INFO [main:ApplicationMaster@306] - Waiting on availability of NameNode information at hdfs://cluster/user/mammut_qa/.dynamometer/application_1545030638014_43446/nn_info.prop 2018-12-18 14:38:11,167 [1513] - WARN [main:DomainSocketFactory@117] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2018-12-18 14:38:12,548 [2894] - INFO [AMRM Heartbeater thread:AMRMClientImpl@360] - Received new token for : hadoop1:45454 2018-12-18 14:38:12,551 [2897] - INFO [AMRM Callback Handler Thread:ApplicationMaster$RMCallbackHandler@483] - Got response from RM for container ask, allocatedCnt=1 2018-12-18 14:38:12,553 [2899] - INFO [AMRM Callback Handler Thread:ApplicationMaster$RMCallbackHandler@511] - Launching NAMENODE on a new container., containerId=container_e105_1545030638014_43446_01_000002, containerNode=hadoop1:45454, containerNodeURI=hadoop1:8042, containerResourceMemory=10240, containerResourceVirtualCores=1 2018-12-18 14:38:12,554 [2900] - INFO [Thread-7:ApplicationMaster$LaunchContainerRunnable@655] - Setting up container launch context for containerid=container_e105_1545030638014_43446_01_000002, isNameNode=true 2018-12-18 14:38:12,620 [2966] - INFO [Thread-7:ApplicationMaster$LaunchContainerRunnable@732] - Completed setting up command for namenode: [./start-component.sh, namenode, hdfs://cluster/user/mammut_qa/.dynamometer/application_1545030638014_43446, 1>/stdout, 2>/stderr] 2018-12-18 14:38:12,633 [2979] - INFO [Thread-7:ApplicationMaster$LaunchContainerRunnable@676] - Starting NAMENODE; track at: http://hadoop1:8042/node/containerlogs/container_e105_1545030638014_43446_01_000002/mammut_qa/ 2018-12-18 14:38:12,635 [2981] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0:NMClientAsyncImpl$ContainerEventProcessor@531] - Processing Event EventType: START_CONTAINER for Container container_e105_1545030638014_43446_01_000002 2018-12-18 14:38:12,638 [2984] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0:ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData@260] - Opening proxy : hadoop1:45454 2018-12-18 14:38:12,709 [3055] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0:ApplicationMaster$NMCallbackHandler@578] - NameNode container started at ID container_e105_1545030638014_43446_01_000002 2018-12-18 14:38:21,357 [11703] - INFO [main:ApplicationMaster@314] - NameNode information: {NM_HTTP_PORT=8042, NN_HOSTNAME=hadoop1, NN_HTTP_PORT=50077, NN_SERVICERPC_PORT=9022, NN_RPC_PORT=9002, CONTAINER_ID=container_e105_1545030638014_43446_01_000002} 2018-12-18 14:38:21,358 [11704] - INFO [main:ApplicationMaster@315] - NameNode can be reached at: hdfs://hadoop1:9002/ 2018-12-18 14:38:21,358 [11704] - INFO [main:DynoInfraUtils@196] - Waiting for NameNode to finish starting up... 2018-12-18 14:40:09,981 [120327] - INFO [main:DynoInfraUtils@355] - Startup progress = 1.00; above threshold of 1.00; done waiting after 108621 ms. 2018-12-18 14:40:09,982 [120328] - INFO [main:DynoInfraUtils@199] - NameNode has started! 2018-12-18 14:40:09,982 [120328] - INFO [main:ApplicationMaster@760] - Looking for block listing files in hdfs://cluster/user/mammut_qa/dyno/blocksone 2018-12-18 14:40:10,002 [120348] - INFO [main:ApplicationMaster@331] - Requesting 1 DataNode containers with 2048MB memory, 1 vcores, 2018-12-18 14:40:10,002 [120348] - INFO [main:ApplicationMaster@340] - Finished requesting datanode containers 2018-12-18 14:40:10,002 [120348] - INFO [main:DynoInfraUtils@219] - Waiting for 0 DataNodes to register with the NameNode... 2018-12-18 14:40:10,012 [120358] - INFO [main:DynoInfraUtils@355] - Number of live DataNodes = 0.00; above threshold of 0.00; done waiting after 9 ms. 2018-12-18 14:40:10,028 [120374] - INFO [main:DynoInfraUtils@237] - Launching thread to trigger block reports for Datanodes with <38774742 blocks reported 2018-12-18 14:40:10,029 [120375] - INFO [main:DynoInfraUtils@299] - Waiting for MissingBlocks to fall below 1938.7372... 2018-12-18 14:40:10,031 [120377] - INFO [main:DynoInfraUtils@359] - Number of missing blocks: 6527.00 2018-12-18 14:40:11,702 [122048] - INFO [AMRM Heartbeater thread:AMRMClientImpl@360] - Received new token for : hadoop:45454 2018-12-18 14:40:11,702 [122048] - INFO [AMRM Callback Handler Thread:ApplicationMaster$RMCallbackHandler@483] - Got response from RM for container ask, allocatedCnt=1 2018-12-18 14:40:11,703 [122049] - INFO [AMRM Callback Handler Thread:ApplicationMaster$RMCallbackHandler@511] - Launching DATANODE on a new container., containerId=container_e105_1545030638014_43446_01_000003, containerNode=hadoop:45454, containerNodeURI=hadoop:8042, containerResourceMemory=10240, containerResourceVirtualCores=1 2018-12-18 14:40:11,703 [122049] - INFO [Thread-12:ApplicationMaster$LaunchContainerRunnable@655] - Setting up container launch context for containerid=container_e105_1545030638014_43446_01_000003, isNameNode=false 2018-12-18 14:40:11,744 [122090] - INFO [Thread-12:ApplicationMaster$LaunchContainerRunnable@732] - Completed setting up command for datanode: [./start-component.sh, datanode, hdfs://hadoop1:9022/, 0, 1>/stdout, 2>/stderr] 2018-12-18 14:40:11,744 [122090] - INFO [Thread-12:ApplicationMaster$LaunchContainerRunnable@676] - Starting DATANODE; track at: http://hadoop:8042/node/containerlogs/container_e105_1545030638014_43446_01_000003/mammut_qa/ 2018-12-18 14:40:11,745 [122091] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1:NMClientAsyncImpl$ContainerEventProcessor@531] - Processing Event EventType: START_CONTAINER for Container container_e105_1545030638014_43446_01_000003 2018-12-18 14:40:11,753 [122099] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1:ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData@260] - Opening proxy : hadoop:45454 2018-12-18 14:40:11,764 [122110] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #2:NMClientAsyncImpl$ContainerEventProcessor@531] - Processing Event EventType: QUERY_CONTAINER for Container container_e105_1545030638014_43446_01_000003 2018-12-18 14:40:11,765 [122111] - INFO [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #2:ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData@260] - Opening proxy : hadoop:45454 2018-12-18 14:40:16,036 [126382] - INFO [main:DynoInfraUtils@359] - Number of missing blocks: 3730000.00 2018-12-18 14:40:28,045 [138391] - INFO [main:DynoInfraUtils@359] - Number of missing blocks: 12705849.00 2018-12-18 14:40:40,055 [150401] - INFO [main:DynoInfraUtils@359] - Number of missing blocks: 19387365.00 2018-12-18 14:42:10,081 [240427] - INFO [Thread-11:DynoInfraUtils$1@264] - Queueing 0 Datanodes for block report: 2018-12-18 14:43:10,179 [300525] - INFO [Thread-11:DynoInfraUtils$1@264] - Queueing 0 Datanodes for block report: 2018-12-18 14:44:10,209 [360555] - INFO [Thread-11:DynoInfraUtils$1@264] - Queueing 0 Datanodes for block report: 2018-12-18 14:45:10,246 [420592] - INFO [Thread-11:DynoInfraUtils$1@264] - Queueing 0 Datanodes for block report: 2018-12-18 14:46:10,286 [480632] - INFO [Thread-11:DynoInfraUtils$1@264] - Queueing 0 Datanodes for block report:

xkrogen commented 5 years ago

Thank you for sharing that! Though, the section you have labeled "NameNode logs" is actually the logs of the ApplicationMaster, not the NameNode -- you can find the NameNode logs by looking at the log line starting like "Starting NAMENODE; track at ...".

One thing I noticed is that you have a lot of blocks:

2018-12-18 14:40:10,002 [120348] - INFO [main:ApplicationMaster@340] - Finished requesting datanode containers
2018-12-18 14:40:10,002 [120348] - INFO [main:DynoInfraUtils@219] - Waiting for 0 DataNodes to register with the NameNode...
2018-12-18 14:40:10,012 [120358] - INFO [main:DynoInfraUtils@355] - Number of live DataNodes = 0.00; above threshold of 0.00; done waiting after 9 ms.
2018-12-18 14:40:10,028 [120374] - INFO [main:DynoInfraUtils@237] - Launching thread to trigger block reports for Datanodes with <38774742 blocks reported
2018-12-18 14:40:10,029 [120375] - INFO [main:DynoInfraUtils@299] - Waiting for MissingBlocks to fall below 1938.7372...
2018-12-18 14:40:10,031 [120377] - INFO [main:DynoInfraUtils@359] - Number of missing blocks: 6527.00

This seems to indicate that you have nearly 40M blocks in the system, which sounds like too much for a single DataNode. Can I suggest that you increase the number of DataNodes you launch, and increase their total memory allocation?

You may also want to adjust the values of the configs dyno.infra.ready.datanode-min-fraction (default 0.99) and dyno.infra.ready.missing-blocks-max-fraction (default 0.001). To make sure all DataNodes report and that there are no missing blocks, you can set these to 1.0 and 0.0, respectively -- by default they allow for a little bit of leeway.

seanshaogh commented 5 years ago

@xkrogen I'm sorry to take so long to reply. This issue was solved. The fsimage file was not matched with running hadoop. Thanks for your help!

xkrogen commented 5 years ago

Great to hear @seanshaogh !