hortonworks / ansible-hortonworks

Ansible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints
Apache License 2.0
248 stars 253 forks source link

failing to start namenode -- namenode not formatted #60

Closed madiot closed 6 years ago

madiot commented 6 years ago

Hi,

I've walked through all the playbooks successfully, with a topology of 2NN, 4DN (2 'hdp-worker-zk' and 2 'hdp-worker' as per below.

Now, when in ambari i try to start all, Namenodes are note starting, and the log shows :

2018-10-01 12:33:00,455 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(716)) - Encountered exception loading fsimage java.io.IOException: NameNode is not formatted.

Has anyone encountered this issue? I see /hadoop/hdfs/namenode is all empty

What would be the recommended course of actions to get both namenodes up and running? Should one be started with 'hdfs namenode -boostrapStandby' and then the otherone formatted?

The hadoop version used : Hadoop 3.1.1.3.0.1.0-187 hdfs getconf -namenodes returns the expected 2 nodes

Another last thing. I have in the core site ha.zookeeper.quorum pointing to 4 nodes (2NN and 2DN) listening on port 2181. When i check each of these nodes 1 of the DN is not listening. The same host that was supposed to be a Zookeeper-server, is apparently missing the JournalNode. Could this be related to some namenode formating issue? Should i clean the hdfs config removing the missing node from following adv configs?

ha.zookeeper.quorum dfs.namenode.shared.edits.dir

If so, what then is the suggested actions to take to get the namenodes formatted? and the cluster up and running?

For reference, here is the content snippets of the template host_groups definition in the playbook/group_vars/all file :

blueprint_name: '{{ cluster_name }}_blueprint' blueprint_file: 'blueprint_dynamic.j2' blueprint_dynamic:

  • host_group: "hdp-master1" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:
    • ZOOKEEPER_SERVER
    • NAMENODE
    • ZKFC
    • JOURNALNODE
    • RESOURCEMANAGER
    • APP_TIMELINE_SERVER
    • TIMELINE_READER
    • YARN_REGISTRY_DNS
    • HISTORYSERVER
    • SPARK2_JOBHISTORYSERVER
    • ZEPPELIN_MASTER
    • HIVE_SERVER
    • HIVE_METASTORE
    • HBASE_MASTER
    • HST_SERVER
    • ACTIVITY_ANALYZER
    • ACTIVITY_EXPLORER
    • HST_AGENT
    • METRICS_MONITOR
  • host_group: "hdp-master2" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:
    • AMBARI_SERVER
    • INFRA_SOLR
    • ZOOKEEPER_SERVER
    • NAMENODE
    • ZKFC
    • JOURNALNODE
    • HIVE_SERVER
    • HIVE_METASTORE
    • OOZIE_SERVER
    • ACTIVITY_ANALYZER
    • KNOX_GATEWAY
    • HST_AGENT
    • METRICS_COLLECTOR
    • METRICS_GRAFANA
    • METRICS_MONITOR
  • host_group: "hdp-worker-zk" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:
    • ZOOKEEPER_SERVER
    • JOURNALNODE
    • DATANODE
    • NODEMANAGER
    • HBASE_REGIONSERVER
    • ACTIVITY_ANALYZER
    • HST_AGENT
    • METRICS_MONITOR

      - SOLR_SERVER

  • host_group: "hdp-worker" clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT'] services:
    • DATANODE
    • NODEMANAGER
    • HBASE_REGIONSERVER
    • HST_AGENT
    • METRICS_MONITOR

      - SOLR_SERVER

alexandruanghel commented 6 years ago

Without actually doing a full test I would say that yes, your topology looks strange. Normally you should have 3 or 5 Zookeepers and 3 journalnodes in a cluster. Your setup tries to do 4 x Zookeeper and 4 x Journalnodes. I have no idea how Ambari behaves in this scenario, but nothing good I assume, looking at your output and errors.

The hdp-worker-zk group is a special role for a 2-masternodes cluster, designed to be used by 1 node - the only worker node that runs the 2 additional master services that are required: zookeeper and journalnode.

Just re-install your cluster with 1 x hdp-worker-zk and 3 x hdp-worker

alexandruanghel commented 6 years ago

You still have this problem @madiot ? Can we close this?