hortonworks / ansible-hortonworks

Ansible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints
Apache License 2.0
248 stars 253 forks source link

ansible-hortonworks

These Ansible playbooks will build a Hortonworks cluster (Hortonworks Data Platform and / or Hortonworks DataFlow) using Ambari Blueprints. For a full list of supported features check below.

DISCLAIMER

These Ansible playbooks offer a specialised way of deploying Ambari-managed Hortonworks clusters. To use these playbooks you'll need to have a good understanding of both Ansible and Ambari Blueprints.

This is not a Hortonworks product and these playbooks are not officially supported by Hortonworks.

For a fully Hortonworks-supported and user friendly way of deploying Ambari-managed Hortonworks clusters, please check Cloudbreak first.

Installation Instructions

Requirements

Concepts

The core concept of these playbooks is the host_groups field in the Ambari Blueprint. This is an essential piece of Ambari Blueprints that maps the topology components to the actual servers.

The host_groups field in the Ambari Blueprint logically groups the components, while the host_groups field in the Cluster Creation Template maps these logical groups to the actual servers that will run the components.

Therefore, these Ansible playbooks try to take advantage of Blueprint's host_groups and map the Ansible inventory groups to the host_groups using a Jinja2 template: cluster_template.j2.

Cloud inventory

A special mention should be given when using a Cloud environment and / or a dynamic Ansible inventory.

In this case, building the Cloud environment is decoupled from building the Ambari cluster, and there needs to be a way to tie things together - the Cloud nodes to the Blueprint layout (e.g. on which Cloud node the NAMENODE should run).

This is done using a feature that exists in all (or most) Clouds: Tags. The Ansible dynamic inventory takes advantage of this Tag information and creates an Ansible inventory group for each Tag.

If these playbooks are also used to build the Cloud environment, the nodes need to be grouped together in the Cloud inventory variables file. This information is then used to set the Tags when building the nodes.

Then, using the Ansible dynamic inventory for the specific Cloud, the helper add_{{ cloud_name }}_nodes playbooks create the Ansible inventory groups that the rest of the playbooks expect.

Parts

Currently, these playbooks are divided into the following parts:

  1. (Optional) Build the Cloud nodes

    Run the build_cloud.sh script to build the Cloud nodes. Refer to the Cloud specific INSTALL guides for more information.

  2. Install the cluster

    Run the install_cluster.sh script that will install the HDP and / or HDF cluster using Blueprints while taking care of the necessary prerequisites.

...or, alternatively, run each step separately (also useful for replaying a specific part in case of failure):

  1. (Optional) Build the Cloud nodes

    Run the build_cloud.sh script to build the Cloud nodes. Refer to the Cloud specific INSTALL guides for more information.

  2. Prepare the Cloud nodes

    Run the prepare_nodes.sh script to prepare the nodes.

    This installs the required OS packages, applies the recommended OS settings and prepares the database and / or the local MIT-KDC.

  3. Install Ambari

    Run the install_ambari.sh script to install Ambari on the nodes.

    This adds the Ambari repo, installs the Ambari Agent and Server packages and configures the Ambari Server with the required Java and database options.

  4. Configure Ambari

    Run the configure_ambari.sh script to configure Ambari.

    This further configures Ambari with some settings, changes admin password and adds the repository information needed by the cluster build.

  5. Apply Blueprint

    Run the apply_blueprint.sh script to install HDP and / or HDF based on an Ambari Blueprint.

    This uploads the blueprint to Ambari and applies it. Ambari would then create and install the cluster.

  6. Post Install

    Run the post_install.sh script to execute any actions after the cluster is built.

Features

Infrastructure support

OS support

Prerequisites done

Cluster build supported features

Dynamic blueprint supported features

The components that will be installed are only those defined in the blueprint_dynamic variable.

  • Supported in this case means all prerequites (databases, passwords, required configs) are taken care of and the component is deployed successfully on the chosen host_group.
  • [x] HDP Services: HDFS, YARN + MapReduce2, Hive, HBase, Accumulo, Oozie, ZooKeeper, Storm, Atlas, Kafka, Knox, Log Search, Ranger, Ranger KMS, SmartSense, Spark2, Zeppelin, Druid, Superset
  • [x] HDF Services: NiFi, NiFi Registry, Schema Registry, Streaming Analytics Manager, ZooKeeper, Storm, Kafka, Knox, Ranger, Log Search
  • [x] HA Configuration: NameNode, ResourceManager, Hive, HBase, Ranger KMS, Druid
  • [x] Secure clusters with MIT KDC (Ambari managed)
  • [x] Secure clusters with Microsoft AD (Ambari managed)
  • [x] Install Ranger and enable all plugins
  • [x] Ranger KMS
  • [ ] Ranger AD integration
  • [ ] Hadoop SSL
  • [ ] Hadoop AD integration
  • [ ] NiFi SSL
  • [ ] NiFi AD integration
  • [ ] Basic memory settings tuning
  • [ ] Make use of additional storage for HDP workers
  • [ ] Make use of additional storage for master services
  • [ ] Configure additional storage for NiFi