failing to start namenode -- namenode not formatted

madiot commented 6 years ago

Hi,

I've walked through all the playbooks successfully, with a topology of 2NN, 4DN (2 'hdp-worker-zk' and 2 'hdp-worker' as per below.

Now, when in ambari i try to start all, Namenodes are note starting, and the log shows :

2018-10-01 12:33:00,455 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(716)) - Encountered exception loading fsimage java.io.IOException: NameNode is not formatted.

Has anyone encountered this issue? I see /hadoop/hdfs/namenode is all empty

What would be the recommended course of actions to get both namenodes up and running? Should one be started with 'hdfs namenode -boostrapStandby' and then the otherone formatted?

The hadoop version used : Hadoop 3.1.1.3.0.1.0-187 hdfs getconf -namenodes returns the expected 2 nodes

Another last thing. I have in the core site ha.zookeeper.quorum pointing to 4 nodes (2NN and 2DN) listening on port 2181. When i check each of these nodes 1 of the DN is not listening. The same host that was supposed to be a Zookeeper-server, is apparently missing the JournalNode. Could this be related to some namenode formating issue? Should i clean the hdfs config removing the missing node from following adv configs?

ha.zookeeper.quorum dfs.namenode.shared.edits.dir

If so, what then is the suggested actions to take to get the namenodes formatted? and the cluster up and running?

For reference, here is the content snippets of the template host_groups definition in the playbook/group_vars/all file :

blueprint_name: '{{ cluster_name }}_blueprint' blueprint_file: 'blueprint_dynamic.j2' blueprint_dynamic:

host_group: "hdp-master1" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:

ZOOKEEPER_SERVER

NAMENODE

ZKFC

JOURNALNODE

RESOURCEMANAGER

APP_TIMELINE_SERVER

TIMELINE_READER

YARN_REGISTRY_DNS

HISTORYSERVER

SPARK2_JOBHISTORYSERVER

ZEPPELIN_MASTER

HIVE_SERVER

HIVE_METASTORE

HBASE_MASTER

HST_SERVER

ACTIVITY_ANALYZER

ACTIVITY_EXPLORER

HST_AGENT

METRICS_MONITOR

host_group: "hdp-master2" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:

AMBARI_SERVER

INFRA_SOLR

ZOOKEEPER_SERVER

NAMENODE

ZKFC

JOURNALNODE

HIVE_SERVER

HIVE_METASTORE

OOZIE_SERVER

ACTIVITY_ANALYZER

KNOX_GATEWAY

HST_AGENT

METRICS_COLLECTOR

METRICS_GRAFANA

METRICS_MONITOR

host_group: "hdp-worker-zk" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:

ZOOKEEPER_SERVER

JOURNALNODE

DATANODE

NODEMANAGER

HBASE_REGIONSERVER

ACTIVITY_ANALYZER

HST_AGENT

METRICS_MONITOR
- SOLR_SERVER

host_group: "hdp-worker" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:

DATANODE

NODEMANAGER

HBASE_REGIONSERVER

HST_AGENT

METRICS_MONITOR
- SOLR_SERVER

alexandruanghel commented 6 years ago

Without actually doing a full test I would say that yes, your topology looks strange. Normally you should have 3 or 5 Zookeepers and 3 journalnodes in a cluster. Your setup tries to do 4 x Zookeeper and 4 x Journalnodes. I have no idea how Ambari behaves in this scenario, but nothing good I assume, looking at your output and errors.

The hdp-worker-zk group is a special role for a 2-masternodes cluster, designed to be used by 1 node - the only worker node that runs the 2 additional master services that are required: zookeeper and journalnode.

Just re-install your cluster with 1 x hdp-worker-zk and 3 x hdp-worker

alexandruanghel commented 6 years ago

You still have this problem @madiot ? Can we close this?

hortonworks / ansible-hortonworks

failing to start namenode -- namenode not formatted #60

- SOLR_SERVER

- SOLR_SERVER