These Ansible playbooks will build a Hortonworks cluster (Hortonworks Data Platform and / or Hortonworks DataFlow) using Ambari Blueprints. For a full list of supported features check below.
Tested with: HDP 3.0 -> 3.1, HDP 2.4 -> 2.6.5, HDP Search 3.0 -> 4.0, HDF 2.0 -> 3.4, Ambari 2.4 -> 2.7 (the versions must be matched as per the support matrix).
This includes building the Cloud infrastructure (optional) and taking care of the prerequisites.
The aim is to first build the nodes in a Cloud environment, prepare them (OS settings, database, KDC, etc) and then install Ambari and create the cluster using Ambari Blueprints.
It can use a static blueprint or a dynamically generated one based on the components from the Ansible variables file.
NAMENODE
is set, there must also be a SECONDARY_NAMENODE
).These Ansible playbooks offer a specialised way of deploying Ambari-managed Hortonworks clusters. To use these playbooks you'll need to have a good understanding of both Ansible and Ambari Blueprints.
This is not a Hortonworks product and these playbooks are not officially supported by Hortonworks.
For a fully Hortonworks-supported and user friendly way of deploying Ambari-managed Hortonworks clusters, please check Cloudbreak first.
Ansible 2.5+
Expects CentOS/RHEL, Ubuntu, Amazon Linux or SLES hosts
The core concept of these playbooks is the host_groups
field in the Ambari Blueprint.
This is an essential piece of Ambari Blueprints that maps the topology components to the actual servers.
The host_groups
field in the Ambari Blueprint logically groups the components, while the host_groups
field in the Cluster Creation Template maps these logical groups to the actual servers that will run the components.
Therefore, these Ansible playbooks try to take advantage of Blueprint's host_groups
and map the Ansible inventory groups to the host_groups
using a Jinja2 template: cluster_template.j2.
host_groups
are defined in the variable file and they need to match the Ansible inventory groups that will run those components.host_groups
are defined in the blueprint itself and they need to match the Ansible inventory groups that will run those components.A special mention should be given when using a Cloud environment and / or a dynamic Ansible inventory.
In this case, building the Cloud environment is decoupled from building the Ambari cluster, and there needs to be a way to tie things together - the Cloud nodes to the Blueprint layout (e.g. on which Cloud node the NAMENODE
should run).
This is done using a feature that exists in all (or most) Clouds: Tags. The Ansible dynamic inventory takes advantage of this Tag information and creates an Ansible inventory group for each Tag.
If these playbooks are also used to build the Cloud environment, the nodes need to be grouped together in the Cloud inventory variables file. This information is then used to set the Tags when building the nodes.
Then, using the Ansible dynamic inventory for the specific Cloud, the helper add_{{ cloud_name }}_nodes
playbooks create the Ansible inventory groups that the rest of the playbooks expect.
tag_Group_
while OpenStack uses meta-Group_
and the helper add_{{ cloud_name }}_nodes
playbooks was the solution to make this work for all Clouds.Currently, these playbooks are divided into the following parts:
(Optional) Build the Cloud nodes
Run the build_cloud.sh
script to build the Cloud nodes. Refer to the Cloud specific INSTALL guides for more information.
Install the cluster
Run the install_cluster.sh
script that will install the HDP and / or HDF cluster using Blueprints while taking care of the necessary prerequisites.
...or, alternatively, run each step separately (also useful for replaying a specific part in case of failure):
(Optional) Build the Cloud nodes
Run the build_cloud.sh
script to build the Cloud nodes. Refer to the Cloud specific INSTALL guides for more information.
Prepare the Cloud nodes
Run the prepare_nodes.sh
script to prepare the nodes.
This installs the required OS packages, applies the recommended OS settings and prepares the database and / or the local MIT-KDC.
Install Ambari
Run the install_ambari.sh
script to install Ambari on the nodes.
This adds the Ambari repo, installs the Ambari Agent and Server packages and configures the Ambari Server with the required Java and database options.
Configure Ambari
Run the configure_ambari.sh
script to configure Ambari.
This further configures Ambari with some settings, changes admin password and adds the repository information needed by the cluster build.
Apply Blueprint
Run the apply_blueprint.sh
script to install HDP and / or HDF based on an Ambari Blueprint.
This uploads the blueprint to Ambari and applies it. Ambari would then create and install the cluster.
Post Install
Run the post_install.sh
script to execute any actions after the cluster is built.
The components that will be installed are only those defined in the
blueprint_dynamic
variable.
- Supported in this case means all prerequites (databases, passwords, required configs) are taken care of and the component is deployed successfully on the chosen
host_group
.- [x] HDP Services:
HDFS
,YARN + MapReduce2
,Hive
,HBase
,Accumulo
,Oozie
,ZooKeeper
,Storm
,Atlas
,Kafka
,Knox
,Log Search
,Ranger
,Ranger KMS
,SmartSense
,Spark2
,Zeppelin
,Druid
,Superset
- [x] HDF Services:
NiFi
,NiFi Registry
,Schema Registry
,Streaming Analytics Manager
,ZooKeeper
,Storm
,Kafka
,Knox
,Ranger
,Log Search
- [x] HA Configuration: NameNode, ResourceManager, Hive, HBase, Ranger KMS, Druid
- [x] Secure clusters with MIT KDC (Ambari managed)
- [x] Secure clusters with Microsoft AD (Ambari managed)
- [x] Install Ranger and enable all plugins
- [x] Ranger KMS
- [ ] Ranger AD integration
- [ ] Hadoop SSL
- [ ] Hadoop AD integration
- [ ] NiFi SSL
- [ ] NiFi AD integration
- [ ] Basic memory settings tuning
- [ ] Make use of additional storage for HDP workers
- [ ] Make use of additional storage for master services
- [ ] Configure additional storage for NiFi