Closed joshj1806 closed 6 years ago
Hi, sorry about my late reply!
You can of course use another fencing method and setup SSH private keys. However, the default is the "dummy" /bin/true, which should work fine and it gets rarely changed.
I have tested before the example-hdp-ha-3-masters-with-storm-kafka
blueprint and it worked fine with the default fencing method so I find strange that you got this error, this should not happen.
Can you give me more information about your configuration (essentially all changes you made in the all
file). There could be a strange combination of settings that triggers this issue.
Many thanks!
---
###########################
## cluster configuration ##
###########################
cluster_name: 'TEST'
ambari_version: '2.6.2.2' # must be the 4-part full version number
hdp_version: '2.6.5.0' # must be the 4-part full version number
hdp_build_number: 'auto' # the HDP build number from docs.hortonworks.com (if set to 'auto', Ansible will try to get it from the repository)
hdf_version: '3.1.2.0' # must be the 4-part full version number
hdf_build_number: 'auto' # the HDF build number from docs.hortonworks.com (if set to 'auto', Ansible will try to get it from the repository)
hdpsearch_version: '3.0.0' # must be the full version number
hdpsearch_build_number: '100' # the HDP Search build number from docs.hortonworks.com (hardcoded to 100 for the moment)
repo_base_url: 'http://public-repo-1.hortonworks.com' # change this if using a Local Repository
###########################
## general configuration ##
###########################
external_dns: no # set to yes to use the existing DNS (when no, it will update the /etc/hosts file - must be set to 'no' when using Azure)
disable_firewall: yes # set to yes to disable the existing local firewall service (iptables, firewalld, ufw)
########################
## java configuration ##
########################
java: 'embedded' # can be set to 'embedded', 'openjdk' or 'oraclejdk'
oraclejdk_options: # only used when java is set to 'oraclejdk'
base_folder: '/usr/java' # the folder where the Java package should be unpacked to
tarball_location: '/tmp/jdk-8u171-linux-x64.tar.gz' # the location of the tarball on the remote system or on the Ansible controller
jce_location: '/tmp/jce_policy-8.zip' # the location of the JCE package on the remote system or on the Ansible controller
remote_files: no # set to yes to indicate the files are already on the remote systems, otherwise they will be copied by Ansible from the Ansible controller
############################
## database configuration ##
############################
database: 'embedded' # can be set to 'embedded', 'postgres', 'mysql' or 'mariadb'
database_options:
external_hostname: '' # if this is empty, Ansible will install and prepare the databases on the ambari-server node
ambari_db_name: 'ambari'
ambari_db_username: 'ambari'
ambari_db_password: 'bigdata'
hive_db_name: 'hive'
hive_db_username: 'hive'
hive_db_password: 'hive'
oozie_db_name: 'oozie'
oozie_db_username: 'oozie'
oozie_db_password: 'oozie'
druid_db_name: 'druid'
druid_db_username: 'druid'
druid_db_password: 'druid'
superset_db_name: 'superset'
superset_db_username: 'superset'
superset_db_password: 'superset'
rangeradmin_db_name: 'ranger'
rangeradmin_db_username: 'ranger'
rangeradmin_db_password: 'ranger'
rangerkms_db_name: 'rangerkms'
rangerkms_db_username: 'rangerkms'
rangerkms_db_password: 'rangerkms'
registry_db_name: 'registry'
registry_db_username: 'registry'
registry_db_password: 'registry'
streamline_db_name: 'streamline'
streamline_db_username: 'streamline'
streamline_db_password: 'streamline'
#####################################
## kerberos security configuration ## # useful if blueprint is dynamic, but can also be used to deploy the MIT KDC
#####################################
security: 'none' # can be set to 'none', 'mit-kdc' or 'active-directory'
security_options:
external_hostname: '' # if this is empty, Ansible will install and prepare the MIT KDC on the Ambari node
realm: 'EXAMPLE.COM'
admin_principal: 'admin' # the Kerberos principal that has the permissions to create new users (don't append the realm)
admin_password: "{{ default_password }}"
kdc_master_key: "{{ default_password }}" # only used when security is set to 'mit-kdc'
ldap_url: 'ldaps://ad.example.com:636' # only used when security is set to 'active-directory'
container_dn: 'OU=hadoop,DC=example,DC=com' # only used when security is set to 'active-directory'
http_authentication: yes # set to yes to enable HTTP authentication (SPNEGO)
##########################
## ranger configuration ## # only useful if blueprint is dynamic
##########################
ranger_options: # only used if RANGER_ADMIN is part of the blueprint stack
enable_plugins: yes # set to 'yes' if the plugins should be enabled for all of the installed services
ranger_security_options: # only used if RANGER_ADMIN is part of the blueprint stack
ranger_admin_password: "{{ default_password }}" # the password for the Ranger admin users (both admin and amb_ranger_admin)
ranger_keyadmin_password: "{{ default_password }}" # the password for the Ranger keyadmin user (will only be set in HDP3, in HDP2 it will remain the default keyadmin)
kms_master_key_password: "{{ default_password }}" # password used for encrypting the Master Key
##################################
## other security configuration ## # only useful if blueprint is dynamic
##################################
ambari_admin_password: 'admin' # the password for the Ambari admin user
default_password: 'AsdQwe123456' # a default password for all required passwords which are not specified in the blueprint
atlas_security_options:
admin_password: "{{ default_password }}" # the password for the Atlas admin user
knox_security_options:
master_secret: "{{ default_password }}" # Knox Master Secret
nifi_security_options:
encrypt_password: "{{ default_password }}" # the password used to encrypt raw configuration values
sensitive_props_key: "{{ default_password }}" # the password used to encrypt any sensitive property values that are configured in processors
superset_security_options:
secret_key: "{{ default_password }}"
admin_password: "{{ default_password }}" # the password for the Superset admin user
smartsense_security_options:
admin_password: "{{ default_password }}" # password for the Activity Explorer's Zeppelin admin user
logsearch_security_options:
admin_password: "{{ default_password }}" # the password for the Logsearch admin user
##########################
## ambari configuration ##
##########################
ambari_admin_user: 'admin'
ambari_admin_default_password: 'admin' # no need to change this (unless the Ambari default changes)
config_recommendation_strategy: 'NEVER_APPLY' # choose between 'NEVER_APPLY', 'ONLY_STACK_DEFAULTS_APPLY', 'ALWAYS_APPLY', 'ALWAYS_APPLY_DONT_OVERRIDE_CUSTOM_VALUES'
smartsense: # Hortonworks subscription details (can be left empty if there is no subscription)
id: ''
account_name: ''
customer_email: ''
wait: true # wait for the cluster to finish installing
wait_timeout: 3600 # 60 minutes
accept_gpl: yes # set to yes to allow Ambari to install GPL licensed libraries
cluster_template_file: 'cluster_template.j2' # the cluster creation template file
#############################
## blueprint configuration ##
#############################
blueprint_name: '{{ cluster_name }}_blueprint' # the name of the blueprint as it will be stored in Ambari
blueprint_file: 'blueprint_dynamic.j2' # the blueprint JSON file - 'blueprint_dynamic.j2' is a Jinja2 template that generates the required JSON
blueprint_dynamic: # properties for the dynamic blueprint - these are only used by the 'blueprint_dynamic.j2' template to generate the JSON
- host_group: "management"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'HCAT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT']
services:
- ZOOKEEPER_SERVER
- JOURNALNODE
- AMBARI_SERVER
- INFRA_SOLR
- ZEPPELIN_MASTER
- APP_TIMELINE_SERVER
- SPARK2_JOBHISTORYSERVER
- HISTORYSERVER
- HST_SERVER
- HST_AGENT
- METRICS_COLLECTOR
- METRICS_GRAFANA
- METRICS_MONITOR
- host_group: "namenode01"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'HCAT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT']
services:
- ZOOKEEPER_SERVER
- NAMENODE
- ZKFC
- JOURNALNODE
- RESOURCEMANAGER
- HIVE_SERVER
- HIVE_METASTORE
- NIMBUS
- DRPC_SERVER
- STORM_UI_SERVER
- HST_AGENT
- METRICS_MONITOR
- host_group: "namenode02"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'HCAT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT']
services:
- ZOOKEEPER_SERVER
- NAMENODE
- ZKFC
- JOURNALNODE
- RESOURCEMANAGER
- HIVE_SERVER
- HIVE_METASTORE
- WEBHCAT_SERVER
- HST_AGENT
- METRICS_MONITOR
- host_group: "datanode01"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'HCAT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT']
services:
- DATANODE
- NODEMANAGER
- HST_AGENT
- METRICS_MONITOR
- host_group: "datanode02"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'HCAT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT']
services:
- KAFKA_BROKER
- DATANODE
- NODEMANAGER
- SUPERVISOR
- HST_AGENT
- METRICS_MONITOR
############################
## helper variables ## # don't change these unless there is a good reason
############################
hdp_minor_version: "{{ hdp_version | regex_replace('.[0-9]+.[0-9]+[0-9_-]*$','') }}"
hdp_major_version: "{{ hdp_minor_version.split('.').0 }}"
hdf_minor_version: "{{ hdf_version | regex_replace('.[0-9]+.[0-9]+[0-9_-]*$','') }}"
hdf_major_version: "{{ hdf_minor_version.split('.').0 }}"
utils_version: "{{ '1.1.0.20' if hdp_minor_version is version_compare('2.5', '<') else ('1.1.0.21' if hdp_version is version_compare('2.6.4', '<') else '1.1.0.22' ) }}"
hdfs_ha_name: "{{ cluster_name | regex_replace('_','-') }}"
I installed 5 node cluster.
Thanks,
Hi, I've just tested with your all
file and it works fine (default centos7 AMI on AWS).
So my guess is that it's an environmental issue (probably /bin/true
doesn't exist or it's not accessible to the hdfs user, which would be very strange).
Are you using any standard AMIs from the clouds or this is a deployment on a custom OS image? Can you check if su - hdfs -c /bin/true; echo $?
works and it shows 0?
Thanks for testing!
I tested on my own cluster with Centos7 images,
yes, su -hdfs -c /bin/true; echo $?
prints 0
I checked this on my 2 namenodes as well as the host runs the ansible script.
I suspect that somehow your namenodes can communicate with each other to retrieve the namespace.
I tried to use this example file to deploy high-availability HDFS cluster. https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/group_vars/example-hdp-ha-3-masters-with-storm-kafka , but it didn't work.
I got an error message
Unable to fetch namespace information from active NN
After I changed https://github.com/hortonworks/ansible-hortonworks/blob/35785c7e88d069a920e581c90192da181e1496fe/playbooks/roles/ambari-blueprint/templates/blueprint_dynamic.j2#L443 to
"dfs.ha.fencing.methods" : "sshfence",
and added a line below"dfs.ha.fencing.ssh.private-key-files" : "/root/.ssh/id_rsa"
, and set up SSH tunneling between two namenode (Active and Standby),it starts work.