sequenceiq / docker-ambari

Docker image with Ambari
291 stars 200 forks source link

Cannot install hdp-multinode-default #15

Closed leebrooks0 closed 9 years ago

leebrooks0 commented 10 years ago

Hi

I have been using your nifty shell functions sucessfully with your default settings when creating a cluster. E.g amb-deploy-cluster 4

Now I am trying to use thehdp-multinode-default blueprint, but installation fails at around 12%.

Here is the command that I am running: amb-deploy-cluster 6 hdp-multinode-default

This is what Ambari shows:

screenshot - 09102014 - 11 13 13

screenshot - 09102014 - 11 14 28

screenshot - 09102014 - 11 17 12

screenshot - 09102014 - 11 18 52

Here is the detailed message from stdout:

2014-10-09 04:53:31,154 - Package['unzip'] {}
2014-10-09 04:53:32,234 - Skipping installing existent package unzip
2014-10-09 04:53:32,234 - Package['curl'] {}
2014-10-09 04:53:32,257 - Skipping installing existent package curl
2014-10-09 04:53:32,257 - Package['net-snmp-utils'] {}
2014-10-09 04:53:32,279 - Skipping installing existent package net-snmp-utils
2014-10-09 04:53:32,279 - Package['net-snmp'] {}
2014-10-09 04:53:32,309 - Skipping installing existent package net-snmp
2014-10-09 04:53:32,310 - Execute['mkdir -p /tmp/HDP-artifacts/ ;   curl -kf   --retry 10 http://amb0.mycorp.kom:8080/resources//jdk-7u45-linux-x64.tar.gz -o /tmp/HDP-artifacts//jdk-7u45-linux-x64.tar.gz'] {'environment': ..., 'not_if': 'test -e /usr/jdk64/jdk1.7.0_45/bin/java', 'path': ['/bin', '/usr/bin/']}
2014-10-09 04:53:32,326 - Skipping Execute['mkdir -p /tmp/HDP-artifacts/ ;   curl -kf   --retry 10 http://amb0.mycorp.kom:8080/resources//jdk-7u45-linux-x64.tar.gz -o /tmp/HDP-artifacts//jdk-7u45-linux-x64.tar.gz'] due to not_if
2014-10-09 04:53:32,327 - Execute['mkdir -p /usr/jdk64 ; cd /usr/jdk64 ; tar -xf /tmp/HDP-artifacts//jdk-7u45-linux-x64.tar.gz > /dev/null 2>&1'] {'not_if': 'test -e /usr/jdk64/jdk1.7.0_45/bin/java', 'path': ['/bin', '/usr/bin/']}
2014-10-09 04:53:32,344 - Skipping Execute['mkdir -p /usr/jdk64 ; cd /usr/jdk64 ; tar -xf /tmp/HDP-artifacts//jdk-7u45-linux-x64.tar.gz > /dev/null 2>&1'] due to not_if
2014-10-09 04:53:32,345 - Execute['mkdir -p /tmp/HDP-artifacts/;     curl -kf --retry 10     http://amb0.mycorp.kom:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /tmp/HDP-artifacts//UnlimitedJCEPolicyJDK7.zip'] {'environment': ..., 'not_if': 'test -e /tmp/HDP-artifacts//UnlimitedJCEPolicyJDK7.zip', 'ignore_failures': True, 'path': ['/bin', '/usr/bin/']}
2014-10-09 04:53:32,448 - Group['hadoop'] {}
2014-10-09 04:53:32,451 - Modifying group hadoop
2014-10-09 04:53:32,824 - Group['users'] {}
2014-10-09 04:53:32,825 - Modifying group users
2014-10-09 04:53:33,114 - Group['users'] {}
2014-10-09 04:53:33,115 - Modifying group users
2014-10-09 04:53:33,445 - User['ambari-qa'] {'gid': 'hadoop', 'groups': [u'users']}
2014-10-09 04:53:33,445 - Adding user User['ambari-qa']
2014-10-09 04:53:36,186 - File['/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2014-10-09 04:53:36,288 - Writing File['/tmp/changeUid.sh'] because it doesn't exist
2014-10-09 04:53:36,309 - Changing permission for /tmp/changeUid.sh from 644 to 555
2014-10-09 04:53:36,311 - Execute['/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2014-10-09 04:53:36,872 - User['hbase'] {'gid': 'hadoop', 'groups': [u'hadoop']}
2014-10-09 04:53:36,872 - Adding user User['hbase']
2014-10-09 04:53:38,064 - File['/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2014-10-09 04:53:38,066 - Execute['/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase 2>/dev/null'] {'not_if': 'test $(id -u hbase) -gt 1000'}
2014-10-09 04:53:38,084 - Skipping Execute['/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase 2>/dev/null'] due to not_if
2014-10-09 04:53:38,084 - Group['nagios'] {}
2014-10-09 04:53:38,085 - Adding group Group['nagios']
2014-10-09 04:53:38,759 - User['nagios'] {'gid': 'nagios'}
2014-10-09 04:53:38,760 - Adding user User['nagios']
2014-10-09 04:53:39,367 - User['oozie'] {'gid': 'hadoop'}
2014-10-09 04:53:39,368 - Adding user User['oozie']
2014-10-09 04:53:40,064 - User['hcat'] {'gid': 'hadoop'}
2014-10-09 04:53:40,064 - Adding user User['hcat']
2014-10-09 04:53:40,641 - User['hcat'] {'gid': 'hadoop'}
2014-10-09 04:53:40,641 - Modifying user hcat
2014-10-09 04:53:40,655 - User['hive'] {'gid': 'hadoop'}
2014-10-09 04:53:40,655 - Adding user User['hive']
2014-10-09 04:53:41,369 - User['yarn'] {'gid': 'hadoop'}
2014-10-09 04:53:41,369 - Modifying user yarn
2014-10-09 04:53:41,663 - Group['nobody'] {}
2014-10-09 04:53:41,664 - Modifying group nobody
2014-10-09 04:53:41,998 - Group['nobody'] {}
2014-10-09 04:53:41,998 - Modifying group nobody
2014-10-09 04:53:42,307 - User['nobody'] {'gid': 'hadoop', 'groups': [u'nobody']}
2014-10-09 04:53:42,308 - Modifying user nobody
2014-10-09 04:53:43,027 - User['nobody'] {'gid': 'hadoop', 'groups': [u'nobody']}
2014-10-09 04:53:43,027 - Modifying user nobody
2014-10-09 04:53:43,044 - User['hdfs'] {'gid': 'hadoop', 'groups': [u'hadoop']}
2014-10-09 04:53:43,044 - Modifying user hdfs
2014-10-09 04:53:43,278 - User['mapred'] {'gid': 'hadoop', 'groups': [u'hadoop']}
2014-10-09 04:53:43,279 - Modifying user mapred
2014-10-09 04:53:43,515 - User['zookeeper'] {'gid': 'hadoop'}
2014-10-09 04:53:43,515 - Modifying user zookeeper
2014-10-09 04:53:43,840 - Repository['HDP-2.1'] {'action': ['create'], 'mirror_list': None, 'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.2.0/', 'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
2014-10-09 04:53:43,854 - File['/etc/yum.repos.d/HDP.repo'] {'content': InlineTemplate(...)}
2014-10-09 04:53:43,856 - Writing File['/etc/yum.repos.d/HDP.repo'] because it doesn't exist
2014-10-09 04:53:43,858 - Package['libganglia-3.5.0-99'] {}
2014-10-09 04:53:43,889 - Installing package libganglia-3.5.0-99 ('/usr/bin/yum -d 0 -e 0 -y install libganglia-3.5.0-99')
2014-10-09 04:55:28,428 - Package['ganglia-devel-3.5.0-99'] {}
2014-10-09 04:55:28,450 - Installing package ganglia-devel-3.5.0-99 ('/usr/bin/yum -d 0 -e 0 -y install ganglia-devel-3.5.0-99')
2014-10-09 04:55:39,627 - Package['ganglia-gmetad-3.5.0-99'] {}
2014-10-09 04:55:39,656 - Installing package ganglia-gmetad-3.5.0-99 ('/usr/bin/yum -d 0 -e 0 -y install ganglia-gmetad-3.5.0-99')
2014-10-09 04:56:36,408 - Package['ganglia-web-3.5.7-99.noarch'] {}
2014-10-09 04:56:36,438 - Installing package ganglia-web-3.5.7-99.noarch ('/usr/bin/yum -d 0 -e 0 -y install ganglia-web-3.5.7-99.noarch')

I put the shell output below:

[root@lee lee]# amb-deploy-cluster 6 hdp-multinode-default
starting an ambari cluster with: 6 nodes
[DEBUG] docker run -d --dns 127.0.0.1 --entrypoint /usr/local/serf/bin/start-serf-agent.sh -e KEYCHAIN= --name amb0 -h amb0.mycorp.kom sequenceiq/ambari:1.6.0 --tag ambari-server=true
19b4ed2a9363a1203302358b0f6bc3e89da937d15d6cb68643d405dee71a4a2e
[DEBUG] docker run -d -e SERF_JOIN_IP=172.17.0.8 --dns 127.0.0.1 --entrypoint /usr/local/serf/bin/start-serf-agent.sh -e KEYCHAIN= --name amb1 -h amb1.mycorp.kom sequenceiq/ambari:1.6.0 --log-level debug
a9ba0f6e5622d1a37dfb43a7cfb9e84326d70ca3b8fb8701cee841508babd138
[DEBUG] docker run -d -e SERF_JOIN_IP=172.17.0.8 --dns 127.0.0.1 --entrypoint /usr/local/serf/bin/start-serf-agent.sh -e KEYCHAIN= --name amb2 -h amb2.mycorp.kom sequenceiq/ambari:1.6.0 --log-level debug
881b1402ed65db3ec21bb358132e0f7e1cca602a3f256d383503ffc2a2771976
[DEBUG] docker run -d -e SERF_JOIN_IP=172.17.0.8 --dns 127.0.0.1 --entrypoint /usr/local/serf/bin/start-serf-agent.sh -e KEYCHAIN= --name amb3 -h amb3.mycorp.kom sequenceiq/ambari:1.6.0 --log-level debug
f90a012ba61d532adf7d181ac4ad536088fd414d73f4a2b59028fc5f0a8aaafa
[DEBUG] docker run -d -e SERF_JOIN_IP=172.17.0.8 --dns 127.0.0.1 --entrypoint /usr/local/serf/bin/start-serf-agent.sh -e KEYCHAIN= --name amb4 -h amb4.mycorp.kom sequenceiq/ambari:1.6.0 --log-level debug
9ac53b4a2f337797ab9dc83db80006e40a0e03571b53728b891526f4cdf3f678
[DEBUG] docker run -d -e SERF_JOIN_IP=172.17.0.8 --dns 127.0.0.1 --entrypoint /usr/local/serf/bin/start-serf-agent.sh -e KEYCHAIN= --name amb5 -h amb5.mycorp.kom sequenceiq/ambari:1.6.0 --log-level debug
6cd606f7b201ef35bcccef16836bd20ea3ace50d9cb6af42f8ed8a316041885d
[DEBUG] docker run -it --rm -e EXPECTED_HOST_COUNT=6 -e BLUEPRINT=hdp-multinode-default --link amb0:ambariserver --entrypoint /bin/sh sequenceiq/ambari:1.6.0 -c /tmp/install-cluster.sh
AMBARI_HOST=172.17.0.8
[DEBUG] waits for ambari server: 172.17.0.8 RUNNING ...
...........
[DEBUG] waits until 6 hosts connected to server ...
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 0
[DEBUG] connected hosts: 6
    _                _                   _  ____   _            _  _ 
   / \    _ __ ___  | |__    __ _  _ __ (_)/ ___| | |__    ___ | || |
  / _ \  | '_ ` _ \ | '_ \  / _` || '__|| |\___ \ | '_ \  / _ \| || |
 / ___ \ | | | | | || |_) || (_| || |   | | ___) || | | ||  __/| || |
/_/   \_\|_| |_| |_||_.__/  \__,_||_|   |_||____/ |_| |_| \___||_||_|

Welcome to Ambari Shell. For command and param completion press TAB, for assistance type 'hint'.
ambari-shell>blueprint defaults
Default blueprints added
ambari-shell>cluster build --blueprint hdp-multinode-default
  HOSTNAME         STATE
  ---------------  -------
  amb1.mycorp.kom  UNKNOWN
  amb0.mycorp.kom  UNKNOWN
  amb4.mycorp.kom  UNKNOWN
  amb3.mycorp.kom  UNKNOWN
  amb2.mycorp.kom  UNKNOWN
  amb5.mycorp.kom  UNKNOWN

  HOSTGROUP  COMPONENT
  ---------  ------------------
  master_4   OOZIE_SERVER
  master_4   ZOOKEEPER_SERVER
  master_4   GANGLIA_MONITOR
  slave_1    NODEMANAGER
  slave_1    HBASE_REGIONSERVER
  slave_1    GANGLIA_MONITOR
  slave_1    DATANODE
  gateway    YARN_CLIENT
  gateway    HIVE_CLIENT
  gateway    HDFS_CLIENT
  gateway    SQOOP
  gateway    GANGLIA_SERVER
  gateway    HBASE_CLIENT
  gateway    OOZIE_CLIENT
  gateway    AMBARI_SERVER
  gateway    PIG
  gateway    ZOOKEEPER_CLIENT
  gateway    GANGLIA_MONITOR
  gateway    MAPREDUCE2_CLIENT
  gateway    NAGIOS_SERVER
  gateway    HCAT
  master_2   YARN_CLIENT
  master_2   HIVE_CLIENT
  master_2   HDFS_CLIENT
  master_2   HIVE_SERVER
  master_2   HIVE_METASTORE
  master_2   HISTORYSERVER
  master_2   ZOOKEEPER_CLIENT
  master_2   WEBHCAT_SERVER
  master_2   GANGLIA_MONITOR
  master_2   MYSQL_SERVER
  master_2   SECONDARY_NAMENODE
  master_3   ZOOKEEPER_SERVER
  master_3   RESOURCEMANAGER                                Installation: FAILED[root@lee lee]# GLIA_MONITOR
  master_1   YARN_CLIENT
  master_1   HDFS_CLIENT
  master_1   NAMENODE
  master_1   GANGLIA_SERVER
  master_1   HBASE_MASTER
  master_1   ZOOKEEPER_SERVER
  master_1   GANGLIA_MONITOR
  master_1   HCAT

CLUSTER_BUILD:hdp-multinode-default>cluster autoAssign
  HOSTGROUP  HOST
  ---------  ---------------
  master_4   amb0.mycorp.kom
  gateway    amb1.mycorp.kom
  master_2   amb2.mycorp.kom
  master_3   amb3.mycorp.kom
  master_1   amb4.mycorp.kom
  slave_1    amb5.mycorp.kom

CLUSTER_BUILD:hdp-multinode-default>cluster create --exitOnFinish true
Successfully created the cluster
keyki commented 10 years ago

Unfortunatelly Ambari often fails to install Ganglia due to network issues, I suggest you to try again or remove that service from the blueprint.

leebrooks0 commented 10 years ago

I have tried quite a few times, so I am going to strip it from the blueprint and see what happens

On Thu, Oct 9, 2014 at 11:39 AM, Krisztian Horvath <notifications@github.com

wrote:

Unfortunatelly Ambari often fails to install Ganglia due to network issues, I suggest you to try again or remove that service from the blueprint.

— Reply to this email directly or view it on GitHub https://github.com/sequenceiq/docker-ambari/issues/15#issuecomment-58485366 .

keyki commented 10 years ago

Use the amb-shell function and you'll be able to issue the commands manually similar to this except using a custom blueprint with blueprint add --url <url_to_the_blueprint> or blueprint add --file <file_location>

leebrooks0 commented 10 years ago

Thanks, that is what I have been doing. A quick question, what is the best way to create 6 dockers with no blueprint?

I have been doing this: amb-deploy-cluster 6 bogus so that it creates the dockers and then fails on cluster creation, whereupon I switch to ambari-shell for the rest

keyki commented 10 years ago

amb-start-first amb-start-node 1 amb-start-node 2 ..

leebrooks0 commented 10 years ago

Thanks, another thing has cropped up

I am trying to run this blueprint (stock minus Ganglia and Nagios): https://gist.githubusercontent.com/leebrooks0/3b107c58e85506e6499d/raw/0fe2cb1cf3aa783d5322d60b5ba4b0b87f756cd4/lee-blueprint

CLUSTER_BUILD:no-monitoring>cluster autoAssign
Command failed java.lang.reflect.UndeclaredThrowableException

Here is what my shell looks like:

ambari-shell>blueprint add --url https://gist.githubusercontent.com/leebrooks0/3b107c58e85506e6499d/raw/0fe2cb1cf3aa783d5322d60b5ba4b0b87f756cd4/lee-blueprint
Blueprint: 'no-monitoring' has been added
ambari-shell>cluster build --blueprint no-monitoring
  HOSTNAME         STATE
  ---------------  -------
  amb1.mycorp.kom  HEALTHY
  amb0.mycorp.kom  HEALTHY
  amb4.mycorp.kom  HEALTHY
  amb3.mycorp.kom  HEALTHY
  amb2.mycorp.kom  HEALTHY
  amb5.mycorp.kom  HEALTHY

  HOSTGROUP  COMPONENT
  ---------  ------------------
  master_3   ZOOKEEPER_SERVER
  master_3   RESOURCEMANAGER
  master_2   YARN_CLIENT
  master_2   HIVE_CLIENT
  master_2   HDFS_CLIENT
  master_2   HIVE_SERVER
  master_2   HIVE_METASTORE
  master_2   HISTORYSERVER
  master_2   ZOOKEEPER_CLIENT
  master_2   WEBHCAT_SERVER
  master_2   MYSQL_SERVER
  master_2   SECONDARY_NAMENODE
  slave      NODEMANAGER
  slave      HBASE_REGIONSERVER
  slave      DATANODE
  master_1   YARN_CLIENT
  master_1   HDFS_CLIENT
  master_1   NAMENODE
  master_1   HBASE_MASTER
  master_1   ZOOKEEPER_SERVER
  master_1   HCAT
  gateway    YARN_CLIENT
  gateway    HIVE_CLIENT
  gateway    HDFS_CLIENT
  gateway    SQOOP
  gateway    HBASE_CLIENT
  gateway    OOZIE_CLIENT
  gateway    AMBARI_SERVER
  gateway    PIG
  gateway    ZOOKEEPER_CLIENT
  gateway    MAPREDUCE2_CLIENT
  gateway    HCAT
  master_4   OOZIE_SERVER
  master_4   ZOOKEEPER_SERVER

CLUSTER_BUILD:no-monitoring>cluster autoAssign
Command failed java.lang.reflect.UndeclaredThrowableException
keyki commented 10 years ago

The reason of the exception is not properly shown I'll fix that. The problem is if you want to use the auto assignment at least 1 host group's name should start with slave_ e.g. slave_1.

leebrooks0 commented 10 years ago

Is the auto assignment part of the Ambari api?

keyki commented 10 years ago

No, it isn't, but they plan to add such functionality.

keyki commented 10 years ago

We implemented the auto assignment mostly for our needs, but you can always assign manually.

leebrooks0 commented 10 years ago

I am now trying this blueprint https://gist.githubusercontent.com/leebrooks0/3b107c58e85506e6499d/raw/db437778c6ddec3bf86f0733d5127ee7cc472728/lee-blueprint which is stock less Nagios, Ganglia , HBase and Oozie.

I get this error with Hive:

screenshot - 09102014 - 15 27 39

This is the error I get Python script has been killed due to timeout

Originally the above blueprint had included HBase and Oozie, but I removed them as they were getting this error too.

Am I doing something wrong or is the blueprint API not ready for use yet?