riptano / ComboAMI

The AMI takes a set of input parameters via the EC2 user-data to install, RAID, ring, and launch a DataStax Enterprise/Community cluster.
69 stars 59 forks source link

AMI launch stuck in a loop when the instance is created in a VPC and has no public IP assigned to it #51

Closed simonso closed 10 years ago

simonso commented 10 years ago

I set up an OpenSwan firewall/VPC between 2 regions.

Now I am ready to launch a datastax community cassandra AMI.

However, "sudo apt-get update" seems to go into a loop for a long time:

https://github.com/riptano/ComboAMI/blob/2.5/ds2_configure.py also break on this "unable to resolve host" case Starting line 315:

logger.exe('sudo apt-get update')
while True:
    output = logger.exe('sudo apt-get update')
    if not output[1] and not 'err' in output[0].lower() and not 'failed' in output[0].lower():
        break

Workaround:

-- put 127.0.1.1 ip-my-private-ip in /etc/hosts, so that the while True logger.exe("sudo apt-get update") output[1] check passes.

If not, the loop won't break on this "unable to resolve host" case.

joaquincasares commented 10 years ago

This has been fixed using this new function that detects if stderr output is spotted during a basic sudo ls command. If so, it modifies /etc/hosts accordingly. This patch should not affect launches within non-VPC environments.

https://github.com/riptano/ComboAMI/blob/4e1fd2347659bfeab2ece766269ed29c198d5964/ds2_configure.py#L161

Bekbolatov commented 10 years ago

I just tried the install in VPC, tested both thrift and datastax java driver against it - looks great - thanks for the quick fix!

One problem is still there: OpsCenter failed to start - seems like related to missing public dns name. Here is a snippet from logs:

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.0.2.48  ?          256     71.5%             752dd5f0-b1d6-46b5-af3a-efa6e2bf729b  2c
UN  10.0.2.49  40.98 KB   256     64.1%             d51edb1d-df4c-4d26-ad53-b663b3139609  2c
UN  10.0.2.50  40.97 KB   256     64.4%             1d2b7771-9043-4da8-a84d-6984dd19d509  2c

Opscenter: http://:8888/
    Please wait 60 seconds if this is the cluster's first start...
joaquincasares commented 10 years ago

Glad your cluster was able to boot!

Thanks for the report. Can you email me your ~/datastax_ami/ami.log please?

That is more of a cosmetic issue with the motd. You should still be able to reach OpsCenter by using the IP for that node and port 8888.

Thanks!

Bekbolatov commented 10 years ago

Getting the log, in the mean time, nothing seems to be listening on 8888:

ubuntu@ip-10-0-2-50:~$ netstat -an | grep LISTEN
tcp        0      0 0.0.0.0:9042            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 10.0.2.50:7000          0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:48027           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:7199            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:9160            0.0.0.0:*               LISTEN
tcp6       0      0 :::61621                :::*                    LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
unix  2      [ ACC ]     STREAM     LISTENING     8452     /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     7573     @/com/ubuntu/upstart
unix  2      [ ACC ]     STREAM     LISTENING     9606     /var/run/acpid.socket
unix  2      [ ACC ]     SEQPACKET  LISTENING     7672     /run/udev/control
Denn0 commented 9 years ago

Hi @joaquincasares

I'm wondering if this workaround is available now in the public AMI's? I'm using this AMI: DataStax Auto-Clustering AMI 2.5.1-hvm (ami-7f33cd08) (I'm working on the eu-west/Ireland region)

It tries to install and in the process deletes ds2_configure.py, so I can't see if your workaround was in there or not... When viewing the hosts file I think not because that line the script would add is not there.

So I'm using the AMI mentioned above to spin up an OpsCenter node. Doesn't work though... I'm having a hard time installing Datastax on these EC2 servers... AMI doesn't work, repo is not available since I'm on a VPC and tarball doesn't start either due to some stubborn RMI error.

Can you help me out? Is there some way to get this AMI working? Thanks!

My ami.log:

[ERROR] 03/31/15-14:01:27 git pull:
error: Failed connect to github.com:443; Connection timed out while accessing https://github.com/riptano/ComboAMI.git/info/refs
fatal: HTTP request failed

[EXEC] 03/31/15-14:01:27 git reset --hard:
HEAD is now at 5f722d6 Update AMI ids for new bake

[EXEC:E] 03/31/15-14:01:28 gpg --import /home/ubuntu/datastax_ami/repo_keys/DataStax_AMI.xxx.key:
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/secring.gpg' created
gpg: keyring `/root/.gnupg/pubring.gpg' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 7123CDFD: public key "Joaquin Casares (DataStax AMI) <joaquin@datastax.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)

[EXEC] 03/31/15-14:01:28 git log --pretty="format:%G?" --show-signature HEAD^..HEAD:
gpg: Signature made Mon Mar 24 23:02:46 2014 UTC using RSA key ID xxx
gpg: Good signature from "Joaquin Casares (DataStax AMI) <joaquin@datastax.com>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.

G
[ERROR] 03/31/15-14:01:28 sudo rm ds2_configure.py:
sudo: unable to resolve host ip-10-129-0-171

[INFO] Deleting ds2_configure.py now. This AMI will never change any configs after this first run.
[EXEC] 03/31/15-14:01:28 sudo apt-key add /home/ubuntu/datastax_ami/repo_keys/Launchpad_VLC.X.key:
OK

[EXEC] 03/31/15-14:01:28 sudo apt-key add /home/ubuntu/datastax_ami/repo_keys/Ubuntu_Archive.X.key:
OK

[ERROR] 03/31/15-14:01:28 sudo rm -rf /etc/motd:
sudo: unable to resolve host ip-10-129-0-171

[ERROR] 03/31/15-14:01:28 sudo touch /etc/motd:
sudo: unable to resolve host ip-10-129-0-171

[INFO] Started with user data set to:
[INFO] --opscenteronly
[INFO] Using instance type: m1.large
[INFO] meta-data:instance-type: m1.large
[INFO] meta-data:local-ipv4: 10.129.0.171
[INFO] meta-data:public-hostname: <same as local-ipv4>
[INFO] meta-data:ami-launch-index: 0
[INFO] meta-data:reservation-id: r-6a03da8c
[ERROR] Exception seen in ds1_launcher.py:
Traceback (most recent call last):
  File "/home/ubuntu/datastax_ami/ds1_launcher.py", line 22, in initial_configurations
    ds2_configure.run()
  File "/home/ubuntu/datastax_ami/ds2_configure.py", line 997, in run
  File "/home/ubuntu/datastax_ami/ds2_configure.py", line 221, in parse_ec2_userdata
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
joaquincasares commented 9 years ago

Hello @Denn0 ,

Thanks for your ami.log.

The workaround will be available in all instances launched after the commit was pushed. However, since your VPC settings are blocking outgoing requests, you'll notice how your git pull command failed to execute. Because of this, the AMI would be using a highly outdated version of the code.

Please ensure the AMIs have outgoing access and try again. If you still see errors, please include your ami.log again.

Thanks, Joaquin

Denn0 commented 9 years ago

Hi @joaquincasares ,

Does that explain this failure? Unfortunately that is not an option. Company policy does not allow these servers to have outbound connectivity other than to our own data center. Hence they run on a VPC without outbound connectivity. Pity. Will try another way.

Thanks though

joaquincasares commented 9 years ago

Hey @Denn0 ,

Yes that is definitely the issue. Unfortunately the AMIs require outbound access to not only update the AMI code, but to download and install Java, DSE, and OpsCenter since these softwares are not baked into the AMI by default.

I hope that clarifies things.

Thanks, Joaquin

Bekbolatov commented 9 years ago

@Denn0 it is definitely not closed-VPC friendly, but you can bake your own image with the most recent version (e.g. aminator)