hortonworks / ansible-hortonworks

Ansible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints
Apache License 2.0
248 stars 253 forks source link

Multinode cluster hangs in apply_blueprint phase #113

Closed markokole closed 5 years ago

markokole commented 5 years ago

Im trying to install a HDP 2.6.5 on Centos 7 on AWS - I provision 3 servers - 1 Ambari, 1 Namenode and 1 Datanode with minimum services. These are host_groups in my ansible_hosts file:

[ambari-server]
[hdp-master]
[hdp-slave]

These are the services I define to each server:

  ambari_services: AMBARI_SERVER, METRICS_COLLECTOR, METRICS_MONITOR

  master_clients: ZOOKEEPER_CLIENT, HDFS_CLIENT
  master_services: ZOOKEEPER_SERVER, NAMENODE, SECONDARY_NAMENODE, METRICS_COLLECTOR, METRICS_MONITOR

  slave_clients: ZOOKEEPER_CLIENT, HDFS_CLIENT
  slave_services: DATANODE, METRICS_COLLECTOR, METRICS_MONITOR

The provisioning of the cluster goes well until the install components phase in apply_blueprint starts. I check in Ambari, HDFS has the green check, Zookeeper is not found (yellow icon with the question mark). There are no parameters in the Config tab under any of the services.

I tried to provision the HDP cluster manually: ran all ansible scripts until the apply_blueprint and then I installed the HDP manually from Ambari and it worked like a charm.

There is no hdp folder in the servers designated as Namenodes or Datanodes. Ambari is up and running as it should.

Is there anything I am missing? A certain service in my service configuration?

alexandruanghel commented 5 years ago

Hi @markokole , you'd have to check the ambari-server.log as to why this is happening. It's usually waiting for something and the deployment hasn't started yet (which is why you see a strange cluster layout that might not be the one you asked for and no packages have been installed in /usr/hdp).

The blueprint was accepted by Ambari but there might still be rare instances when the blueprint triggers a certain behaviour (like waiting for other nodes to join the cluster before the blueprint would be applied successfully). In your variables I see at least 1 problem: multiple METRICS_COLLECTOR. I've never seen such a blueprint so I don't know how Ambari would react. Anyway, don't believe multiple METRICS_COLLECTOR is supported, so best to keep it on the ambari-server group.

markokole commented 5 years ago

Hi! @alexandruanghel Regarding METRICS_COLLECTOR: Ive moved it now, its only in the ambari-server group. The ambari-server.log is giving me this line constantly once the installation starts: INFO [pool-20-thread-1] ConfigureClusterTask:75 - Some host groups require more hosts, cluster configuration cannot begin

In Ambari, this is the operation name: Logical Request: Provision Cluster 'hdp-hdfs-only'. This is where it is stuck.

Thanks for assistance!

markokole commented 5 years ago

Some more details from ambari-server.log: The servers for Namenode and Datanode are mentioned here:

13 Dec 2018 14:50:25,677  INFO [pool-4-thread-1] HostRequest:147 - HostRequest: marking host request 2 for ip-10-0-0-60.ec2.internal as FAILED due to Some host groups require more hosts, cluster configuration cannot begin
13 Dec 2018 14:50:25,681  INFO [pool-4-thread-1] HostRequest:147 - HostRequest: marking host request 1 for ip-10-0-0-240.ec2.internal as FAILED due to Some host groups require more hosts, cluster configuration cannot begin

When I tried to manually install the cluster, I used the private DNS names and it worked, so this cannot be the problem?

alexandruanghel commented 5 years ago

Uh, it's a bug caused by https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/roles/ambari-blueprint/templates/cluster_template.j2#L32

These playbooks build an Ansible group called ambari-server hence why it's excluded in the cluster creation template: https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/set_variables.yml#L63

But you also use a legitimate group called ambari-server.

Will fix this soon.

markokole commented 5 years ago

Nice catch! :) I changed the name of Ambari server host group and now it's installing as it should.

Thanks for quick respond!

alexandruanghel commented 5 years ago

There's no need for those groups to be excluded anymore, it was something left from a previous version of that template, so I just removed them, should be good now: https://github.com/hortonworks/ansible-hortonworks/commit/1fbad844ceff3f0b143923b287d1937458e7f627