Closed seanshaogh closed 5 years ago
The NameNode UI showed :There are 100 missing blocks. The following files may be corrupted.
Hi @seanshaogh, thanks for reporting this issue! The steps you've used to launch it seem correct. Can you provide some more information about what happened:
Hi @xkrogen ,thanks for your reply! I tried to launch one datanode in container and found none datanode registered in namenode. The AM logs showed DataNode process itself successfully launched within the container. It looks like the datanode could not connected to namenode . The AM logs show as below:
Starting datanode with ID 000003 PWD is: /mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 Saving original HADOOP_HOME as: /usr/ndp/current/yarn_nodemanager Saving original HADOOP_CONF_DIR as: /usr/ndp/current/yarn_nodemanager/conf Environment variables are set as: (note that this doesn't include changes made by hadoop-env.sh) XDG_SESSION_ID=c797411 YARN_RESOURCEMANAGER_OPTS= -Drm.audit.logger=INFO,RMAUDIT -Drm.audit.logger=INFO,RMAUDIT HADOOP_LOG_DIR=/mnt/dfs/0/hadoop/yarn/log/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 HADOOP_IDENT_STRING=yarn SHELL=/bin/bash HADOOP_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 NM_HOST=hadoop YARN_PID_DIR=/var/run/ndp/hadoop-yarn/yarn HADOOP_PID_DIR=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node/pid NN_EDITS_DIR= HADOOP_PREFIX=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 YARN_NICENESS=0 NM_AUX_SERVICE_mapreduce_shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
QTDIR=/usr/lib64/qt-3.3 NN_ADDITIONAL_ARGS= NM_HTTP_PORT=8042 QTINC=/usr/lib64/qt-3.3/include QT_GRAPHICSSYSTEM_CHECKED=1 LOCAL_DIRS=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446 USER=qa JAVA_LIBRARY_PATH=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir HADOOP_HEAPSIZE= HADOOP_TOKEN_FILE_LOCATION=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/container_tokens HADOOP_LIBEXEC_DIR=/usr/ndp/current/yarn_nodemanager/libexec LOG_DIRS=/mnt/dfs/0/hadoop/yarn/log/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 MALLOC_ARENA_MAX=4 YARN_NODEMANAGER_OPTS= -Dnm.audit.logger=INFO,NMAUDIT -Dnm.audit.logger=INFO,NMAUDIT YARN_ROOT_LOGGER=INFO,EWMA,RFA PATH=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7/bin:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent HADOOP_HDFS_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 YARN_IDENT_STRING=yarn HADOOP_COMMON_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 PWD=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003 JAVA_HOME=/usr/jdk64/jdk1.8.0_152 NN_NAME_DIR= HADOOP_YARN_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 HADOOP_CLASSPATH=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/additionalClasspath/ LANG=en_US.UTF-8 HADOOP_CONF_DIR=/etc/hdfs/hdfs_namenode/2.7.3/0 HADOOP_OPTS=-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/ndp/hadoop-hdfs/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/ndp/hadoop-hdfs/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/mnt/dfs/0/ndp/3.3.0/yarn_nodemanager/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true YARN_TIMELINESERVER_HEAPSIZE=1024 YARN_LOG_DIR=/var/log/ndp/hadoop-yarn/yarn_nodemanager LIBHDFS_OPTS=-Djava.library.path=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7/lib/native HOME=/home/ SHLVL=4 DN_ADDITIONAL_ARGS= YARN_LOGFILE=yarn-yarn-nodemanager-hadoop.log YARN_CONF_DIR=/etc/mapreduce2/mapreduce_client/2.7.3/0 JVM_PID=8092 YARN_NODEMANAGER_HEAPSIZE=4096 HADOOP_MAPRED_HOME=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7 HADOOP_SSH_OPTS=-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR NM_PORT=45454 LOGNAME=qa QTLIB=/usr/lib64/qt-3.3/lib NM_AUX_SERVICE_spark_shuffle= HADOOP_HOME_WARN_SUPPRESS=1 CONTAINER_ID=container_e105_1545030638014_43446_01_000003 LESSOPEN=||/usr/bin/lesspipe.sh %s NN_FILE_METRIC_PERIOD=60 HADOOP_ROOT_LOGGER=INFO,RFA XDG_RUNTIME_DIR=/run/user/5012 YARN_RESOURCEMANAGER_HEAPSIZE=6144 HADOOP_YARNUSER=yarn =/usr/bin/printenv
Going to sleep for 0 sec... Executing the following: /mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/hadoopBinary/hadoop-2.7.3-1.2.7/bin/hadoop jar dynamometer.jar com.linkedin.dynamometer.SimulatedDataNodes -D fs.defaultFS=hdfs://hadoop1:9022/ -D dfs.datanode.hostname=hadoop -D dfs.datanode.data.dir=file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//hadoop/hdfs/data,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/0,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/1,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/2,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/3,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/4,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/5,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/6,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/7,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/8,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/9,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/10,file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node//mnt/dfs/11 -D dfs.datanode.ipc.address=0.0.0.0:0 -D dfs.datanode.http.address=0.0.0.0:0 -D dfs.datanode.address=0.0.0.0:0 -D dfs.datanode.directoryscan.interval=-1 -D fs.du.interval=43200000 -D fs.getspaceused.jitterMillis=21600000 -D hadoop.tmp.dir=/mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/dyno-node -D hadoop.security.authentication=simple -D hadoop.security.authorization=false -D dfs.http.policy=HTTP_ONLY -D dfs.nameservices= -D dfs.web.authentication.kerberos.principal= -D dfs.web.authentication.kerberos.keytab= -D hadoop.http.filter.initializers= -D dfs.datanode.kerberos.principal= -D dfs.datanode.keytab.file= -D dfs.domain.socket.path= -D dfs.client.read.shortcircuit=false BP-555526057-yarn-1534758010800 file:///mnt/dfs/0/hadoop/yarn/local/usercache/qa/appcache/application_1545030638014_43446/container_e105_1545030638014_43446_01_000003/blocks/block0 Started datanode at pid 8219 Waiting for parent process (PID: 8092) OR datanode process to exit DataNodes will connect to NameNode at hadoop1:9022 Found 1 block listing files; launching DataNodes accordingly. Waiting for DataNodes to connect to NameNode and init storage directories.
2018-12-18 14:38:09,654 [0] - INFO [main:ApplicationMaster@164] - Initializing ApplicationMaster
2018-12-18 14:38:09,981 [327] - INFO [main:ApplicationMaster@229] - Application master for app, appId=43446, clustertimestamp=1545030638014, attemptId=1
2018-12-18 14:38:09,981 [327] - INFO [main:ApplicationMaster@258] - Starting ApplicationMaster
2018-12-18 14:38:10,103 [449] - WARN [main:NativeCodeLoader@62] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-12-18 14:38:10,304 [650] - INFO [main:NMClientAsyncImpl@107] - Upper bound of the thread pool size is 500
2018-12-18 14:38:10,306 [652] - INFO [main:ContainerManagementProtocolProxy@81] - yarn.client.max-cached-nodemanagers-proxies : 0
2018-12-18 14:38:10,510 [856] - INFO [main:ApplicationMaster@300] - Requested NameNode ask: Capability[<memory:2048, vCores:1>]Priority[0]
2018-12-18 14:38:10,518 [864] - INFO [main:ApplicationMaster@306] - Waiting on availability of NameNode information at hdfs://cluster/user/mammut_qa/.dynamometer/application_1545030638014_43446/nn_info.prop
2018-12-18 14:38:11,167 [1513] - WARN [main:DomainSocketFactory@117] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2018-12-18 14:38:12,548 [2894] - INFO [AMRM Heartbeater thread:AMRMClientImpl@360] - Received new token for : hadoop1:45454
2018-12-18 14:38:12,551 [2897] - INFO [AMRM Callback Handler Thread:ApplicationMaster$RMCallbackHandler@483] - Got response from RM for container ask, allocatedCnt=1
2018-12-18 14:38:12,553 [2899] - INFO [AMRM Callback Handler Thread:ApplicationMaster$RMCallbackHandler@511] - Launching NAMENODE on a new container., containerId=container_e105_1545030638014_43446_01_000002, containerNode=hadoop1:45454, containerNodeURI=hadoop1:8042, containerResourceMemory=10240, containerResourceVirtualCores=1
2018-12-18 14:38:12,554 [2900] - INFO [Thread-7:ApplicationMaster$LaunchContainerRunnable@655] - Setting up container launch context for containerid=container_e105_1545030638014_43446_01_000002, isNameNode=true
2018-12-18 14:38:12,620 [2966] - INFO [Thread-7:ApplicationMaster$LaunchContainerRunnable@732] - Completed setting up command for namenode: [./start-component.sh, namenode, hdfs://cluster/user/mammut_qa/.dynamometer/application_1545030638014_43446, 1>
Thank you for sharing that! Though, the section you have labeled "NameNode logs" is actually the logs of the ApplicationMaster, not the NameNode -- you can find the NameNode logs by looking at the log line starting like "Starting NAMENODE; track at ...".
One thing I noticed is that you have a lot of blocks:
2018-12-18 14:40:10,002 [120348] - INFO [main:ApplicationMaster@340] - Finished requesting datanode containers
2018-12-18 14:40:10,002 [120348] - INFO [main:DynoInfraUtils@219] - Waiting for 0 DataNodes to register with the NameNode...
2018-12-18 14:40:10,012 [120358] - INFO [main:DynoInfraUtils@355] - Number of live DataNodes = 0.00; above threshold of 0.00; done waiting after 9 ms.
2018-12-18 14:40:10,028 [120374] - INFO [main:DynoInfraUtils@237] - Launching thread to trigger block reports for Datanodes with <38774742 blocks reported
2018-12-18 14:40:10,029 [120375] - INFO [main:DynoInfraUtils@299] - Waiting for MissingBlocks to fall below 1938.7372...
2018-12-18 14:40:10,031 [120377] - INFO [main:DynoInfraUtils@359] - Number of missing blocks: 6527.00
This seems to indicate that you have nearly 40M blocks in the system, which sounds like too much for a single DataNode. Can I suggest that you increase the number of DataNodes you launch, and increase their total memory allocation?
You may also want to adjust the values of the configs dyno.infra.ready.datanode-min-fraction
(default 0.99) and dyno.infra.ready.missing-blocks-max-fraction
(default 0.001). To make sure all DataNodes report and that there are no missing blocks, you can set these to 1.0 and 0.0, respectively -- by default they allow for a little bit of leeway.
@xkrogen I'm sorry to take so long to reply. This issue was solved. The fsimage file was not matched with running hadoop. Thanks for your help!
Great to hear @seanshaogh !
To xkrogen , Good afternoon! The NameNode would miss all blocks and none DataNode was registered when Manual Workload Launch. These commands was used: 1.Execute the Block Generation Job: ./generate-block-lists.sh -fsimage_input_path hdfs://cluster/user/qa/dyno/fsimage/fsimage_0000000000282000135.xml -block_image_output_dir hdfs://cluster/user/qa/dyno/blocks -num_reducers 1 -num_datanodes 1
2.Manual Workload Launch: ./start-dynamometer-cluster.sh --hadoop_binary_path hadoop-2.7.3-1.2.7.tar.gz --conf_path /home/hdfs/Dynamometer/dynamometer-0.1.0-SNAPSHOT/bin/hadoop --fs_image_dir hdfs://cluster/user/qa/dyno/fsimage --block_list_path hdfs://cluster/user/qa/dyno/blocks