jtriley / StarCluster

StarCluster is an open source cluster-computing toolkit for Amazon's Elastic Compute Cloud (EC2).
http://star.mit.edu/cluster
GNU Lesser General Public License v3.0
582 stars 313 forks source link

Issues launching clusters in region eu-west-1 #265

Closed cl-s closed 11 years ago

cl-s commented 11 years ago

When attempting to launch instances in region eu-west-1 in v0.9999 of StarCluster starcluster start hangs at the step detailed below:


$ starcluster start testcluster1
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Using default cluster template: testcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 5-node cluster...
>>> Creating security group @sc-testcluster1...
Reservation:r-3da81077
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for instances to activate...| 

On checking the EC2 Management Console it is clear that the nodes specified in the conf file are started but not properly configured.

This issue has been tested with all of the following AMIs:


32bit Images:
-------------
ami-a5447bd1 eu-west-1 starcluster-base-ubuntu-11.10-x86 (EBS)

64bit Images:
--------------
ami-543a2f20 eu-west-1 starcluster-base-ubuntu-12.04-x86_64-hvm (HVM-EBS)
ami-6c3a2f18 eu-west-1 starcluster-base-ubuntu-12.04-x86_64 (EBS)
ami-ab447bdf eu-west-1 starcluster-base-ubuntu-11.10-x86_64 (EBS)
ami-38d5ff4c eu-west-1 starcluster-base-ubuntu-9.10-x86_64-rc4

The loaded config file is as follows:


####################################
## StarCluster Configuration File ##
####################################
[global]
# Configure the default cluster template to use when starting a cluster
# defaults to 'smallcluster' defined below. This template should be usable
# out-of-the-box provided you've configured your keypair correctly
DEFAULT_TEMPLATE=testcluster
# enable experimental features for this release
#ENABLE_EXPERIMENTAL=True
# number of seconds to wait when polling instances (default: 30s)
#REFRESH_INTERVAL=15
# specify a web browser to launch when viewing spot history plots
#WEB_BROWSER=chromium
# split the config into multiple files
#INCLUDE=~/.starcluster/aws, ~/.starcluster/keys, ~/.starcluster/vols

#############################################
## AWS Credentials and Connection Settings ##
#############################################
[aws info]
# This is the AWS credentials section (required).
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = #my aws access key id
AWS_SECRET_ACCESS_KEY = #my aws secrect access key
# replace this with your account number
AWS_USER_ID= # my account number
# Uncomment to specify a different Amazon AWS region  (OPTIONAL)
# (defaults to us-east-1 if not specified)
# NOTE: AMIs have to be migrated!
AWS_REGION_NAME = eu-west-1
AWS_REGION_HOST = ec2.eu-west-1.amazonaws.com
# Uncomment these settings when creating an instance-store (S3) AMI (OPTIONAL)
#EC2_CERT = /path/to/your/cert-asdf0as9df092039asdfi02089.pem
#EC2_PRIVATE_KEY = /path/to/your/pk-asdfasd890f200909.pem
# Uncomment these settings to use a proxy host when connecting to AWS
#AWS_PROXY = your.proxyhost.com
#AWS_PROXY_PORT = 8080
#AWS_PROXY_USER = yourproxyuser
#AWS_PROXY_PASS = yourproxypass

###########################
## Defining EC2 Keypairs ##
###########################
# Sections starting with "key" define your keypairs. See "starcluster createkey
# --help" for instructions on how to create a new keypair. Section name should
# match your key name e.g.:
[key eu-test-key]
KEY_LOCATION=/home/ubuntu/.ssh/eu-test-key.rsa

# You can of course have multiple keypair sections
# [key myotherkey]
# KEY_LOCATION=~/.ssh/myotherkey.rsa

################################
## Defining Cluster Templates ##
################################
# Sections starting with "cluster" represent a cluster template. These
# "templates" are a collection of settings that define a single cluster
# configuration and are used when creating and configuring a cluster. You can
# change which template to use when creating your cluster using the -c option
# to the start command:
#
#     $ starcluster start -c mediumcluster mycluster
#
# If a template is not specified then the template defined by DEFAULT_TEMPLATE
# in the [global] section above is used. Below is the "default" template named
# "smallcluster". You can rename it but dont forget to update the
# DEFAULT_TEMPLATE setting in the [global] section above. See the next section
# on defining multiple templates.

[cluster testcluster]
# change this to the name of one of the keypair sections defined above
KEYNAME = eu-test-key
# number of ec2 instances to launch
CLUSTER_SIZE = 5
# create the following user on the cluster
CLUSTER_USER = sgeadmin
# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash
# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-a5447bd1
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = m1.small
# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True
# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
#MASTER_INSTANCE_TYPE = m1.small
# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
#MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c
# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
#VOLUMES = oceandata, biodata
# list of plugins to load after StarCluster's default setup routines (OPTIONAL)
# see "Configuring StarCluster Plugins" below on how to define plugin sections
#PLUGINS = myplugin, myplugin2
# list of permissions (or firewall rules) to apply to the cluster's security
# group (OPTIONAL).
#PERMISSIONS = ssh, http
# Uncomment to always create a spot cluster when creating a new cluster from
# this template. The following example will place a $0.50 bid for each spot
# request.
#SPOT_BID = 0.50

###########################################
## Defining Additional Cluster Templates ##
###########################################
# You can also define multiple cluster templates. You can either supply all
# configuration options as with smallcluster above, or create an
# EXTENDS= variable in the new cluster section to use all
# settings from  as defaults. Below are example templates that
# use the EXTENDS feature:

# [cluster mediumcluster]
# Declares that this cluster uses smallcluster as defaults
# EXTENDS=smallcluster
# This section is the same as smallcluster except for the following settings:
# KEYNAME=myotherkey
# NODE_INSTANCE_TYPE = c1.xlarge
# CLUSTER_SIZE=8
# VOLUMES = biodata2

# [cluster largecluster]
# Declares that this cluster uses mediumcluster as defaults
# EXTENDS=mediumcluster
# This section is the same as mediumcluster except for the following variables:
# CLUSTER_SIZE=16

#############################
## Configuring EBS Volumes ##
#############################
# StarCluster can attach one or more EBS volumes to the master and then
# NFS_share these volumes to all of the worker nodes. A new [volume] section
# must be created for each EBS volume you wish to use with StarCluser. The
# section name is a tag for your volume. This tag is used in the VOLUMES
# setting of a cluster template to declare that an EBS volume is to be mounted
# and nfs shared on the cluster. (see the commented VOLUMES setting in the
# example 'smallcluster' template above) Below are some examples of defining
# and configuring EBS volumes to be used with StarCluster:

# Sections starting with "volume" define your EBS volumes
# [volume biodata]
# attach vol-c9999999 to /home on master node and NFS-shre to worker nodes
# VOLUME_ID = vol-c999999
# MOUNT_PATH = /home

# Same volume as above, but mounts to different location
# [volume biodata2]
# VOLUME_ID = vol-c999999
# MOUNT_PATH = /opt/

# Another volume example
# [volume oceandata]
# VOLUME_ID = vol-d7777777
# MOUNT_PATH = /mydata

# By default StarCluster will attempt first to mount the entire volume device,
# failing that it will try the first partition. If you have more than one
# partition you will need to set the PARTITION number, e.g.:
# [volume oceandata]
# VOLUME_ID = vol-d7777777
# MOUNT_PATH = /mydata
# PARTITION = 2

############################################
## Configuring Security Group Permissions ##
############################################
# Sections starting with "permission" define security group rules to
# automatically apply to newly created clusters. PROTOCOL in the following
# examples can be can be: tcp, udp, or icmp. CIDR_IP defaults to 0.0.0.0/0 or
# "open to the # world"

# open port 80 on the cluster to the world
# [permission http]
# PROTOCOL = tcp
# FROM_PORT = 80
# TO_PORT = 80

# open https on the cluster to the world
# [permission https]
# PROTOCOL = tcp
# FROM_PORT = 443
# TO_PORT = 443

# open port 80 on the cluster to an ip range using CIDR_IP
# [permission http]
# PROTOCOL = tcp
# FROM_PORT = 80
# TO_PORT = 80
# CIDR_IP = 18.0.0.0/8

# restrict ssh access to a single ip address ()
# [permission ssh]
# PROTOCOL = tcp
# FROM_PORT = 22
# TO_PORT = 22
# CIDR_IP = /32

#####################################
## Configuring StarCluster Plugins ##
#####################################
# Sections starting with "plugin" define a custom python class which perform
# additional configurations to StarCluster's default routines. These plugins
# can be assigned to a cluster template to customize the setup procedure when
# starting a cluster from this template (see the commented PLUGINS setting in
# the 'smallcluster' template above). Below is an example of defining a user
# plugin called 'myplugin':

# [plugin myplugin]
# NOTE: myplugin module must either live in ~/.starcluster/plugins or be
# on your PYTHONPATH
# SETUP_CLASS = myplugin.SetupClass
# extra settings are passed as __init__ arguments to your plugin:
# SOME_PARAM_FOR_MY_PLUGIN = 1
# SOME_OTHER_PARAM = 2

######################
## Built-in Plugins ##
######################
# The following plugins ship with StarCluster and should work out-of-the-box.
# Uncomment as needed. Don't forget to update your PLUGINS list!
# See http://web.mit.edu/star/cluster/docs/latest/plugins for plugin details.
#
# Use this plugin to install one or more packages on all nodes
# [plugin pkginstaller]
# SETUP_CLASS = starcluster.plugins.pkginstaller.PackageInstaller
# # list of apt-get installable packages
# PACKAGES = mongodb, python-pymongo
#
# Use this plugin to create one or more cluster users and download all user ssh
# keys to $HOME/.starcluster/user_keys/-.tar.gz
# [plugin createusers]
# SETUP_CLASS = starcluster.plugins.users.CreateUsers
# NUM_USERS = 30
# # you can also comment out NUM_USERS and specify exact usernames, e.g.
# # usernames = linus, tux, larry
# DOWNLOAD_KEYS = True
#
# Use this plugin to configure the Condor queueing system
# [plugin condor]
# SETUP_CLASS = starcluster.plugins.condor.CondorPlugin
#
# The SGE plugin is enabled by default and not strictly required. Only use this
# if you want to tweak advanced settings in which case you should also set
# DISABLE_QUEUE=TRUE in your cluster template. See the plugin doc for more
# details.
# [plugin sge]
# SETUP_CLASS = starcluster.plugins.sge.SGEPlugin
# MASTER_IS_EXEC_HOST = False
#
# The IPCluster plugin configures a parallel IPython cluster with optional
# web notebook support. This allows you to run Python code in parallel with low
# latency message passing via ZeroMQ.
# [plugin ipcluster]
# SETUP_CLASS = starcluster.plugins.ipcluster.IPCluster
# ENABLE_NOTEBOOK = True
# #set a password for the notebook for increased security
# NOTEBOOK_PASSWD = a-secret-password
#
# Use this plugin to create a cluster SSH "dashboard" using tmux. The plugin
# creates a tmux session on the master node that automatically connects to all
# the worker nodes over SSH. Attaching to the session shows a separate window
# for each node and each window is logged into the node via SSH.
# [plugin tmux]
# SETUP_CLASS = starcluster.plugins.tmux.TmuxControlCenter
#
# Use this plugin to change the default MPI implementation on the
# cluster from OpenMPI to MPICH2.
# [plugin mpich2]
# SETUP_CLASS = starcluster.plugins.mpich2.MPICH2Setup
#
# Configure a hadoop cluster. (includes dumbo setup)
# [plugin hadoop]
# SETUP_CLASS = starcluster.plugins.hadoop.Hadoop
#
# Configure a distributed MySQL Cluster
# [plugin mysqlcluster]
# SETUP_CLASS = starcluster.plugins.mysql.MysqlCluster
# NUM_REPLICAS = 2
# DATA_MEMORY = 80M
# INDEX_MEMORY = 18M
# DUMP_FILE = test.sql
# DUMP_INTERVAL = 60
# DEDICATED_QUERY = True
# NUM_DATA_NODES = 2
#
# Install and setup an Xvfb server on each cluster node
# [plugin xvfb]
# SETUP_CLASS = starcluster.plugins.xvfb.XvfbSetup

When the region is changed back to the default region (with relevant modifications to AMI Image IDs, ect.) the above config file allows one to create clusters without issue.

jtriley commented 11 years ago

Please clarify what makes it 'clear' that they're not configured properly.

Also, I just successfully started a 12.04 cluster in eu-west-1 with the following command:

$ starcluster -r eu-west-1 start -s 1 -i t1.micro -n ami-6c3a2f18 -k myeukey eutest

I'll try to permanently set the region in the config and see if I can reproduce your issue.

jtriley commented 11 years ago

I just updated my config to use the eu-west-1 region by default and I was able to start a 1-node t1.micro cluster using 12.04 AMI ami-6c3a2f18 with a simple start command:

$ starcluster start eutest2

At this point I'm interested to hear more details about your findings in AWS console...

cl-s commented 11 years ago

When attempting to create a new cluster using the configuration file I pasted in my first post starcluster start halts at the line:

>>> Waiting for instances to activate...\

At this stage I can see in the EC2 Manage window that the cluster images have been created thusly:

selection_055

But when one logs into the machines it is clear that they are not configured properly as commands like qstat do not work.

So far I have been trying to run all nodes as node instance type node_instance_type = m1.small rather than the micro instances you have been trying. So I have also tried switching to using t1.micro instances instead but the problem still seems to persist. The problem also seems to persist when I only have one node in the cluster.

jtriley commented 11 years ago

@cl-s What does listclusters say?

$ starcluster listclusters
izharw commented 11 years ago

I'm having the same problem with us-west-{1,2}. See my post to the mailing list http://star.mit.edu/cluster/mlarchives/1754.html

cl-s commented 11 years ago

When the start function stalls $ starcluster listclusters shows the following:

-------------------------------------
eutest (security group: @sc-eutest)
-------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
cl-s commented 11 years ago

@izharw I can confirm that I also seem to have the same problem when attempting to start clusters in the us-west-1 and us-west-2 regions.

I have tested in the eu-west-1 region with the following AMIs:

32bit Images:
-------------
ami-06674b43 us-west-1 starcluster-base-ubuntu-12.04-x86 (EBS)
64bit Images:
--------------
ami-02674b47 us-west-1 starcluster-base-ubuntu-12.04-x86_64 (EBS)

and in us-west-2 with the following AMIs:

32bit Images:
-------------
ami-0a6afe3a us-west-2 starcluster-base-ubuntu-12.04-x86 (EBS)
64bit Images:
--------------
ami-706afe40 us-west-2 starcluster-base-ubuntu-12.04-x86_64 (EBS)

Again the instances show up as created in the Amazon EC2 Management Console yet the process stalls at >>> Waiting for instances to activate... and $ starcluster listclusters shows the following:

-------------------------------------------
uswesttest (security group: @sc-uswesttest)
-------------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
jtriley commented 11 years ago

This is really strange. What does listinstances report?

$ starcluster listinstances
cl-s commented 11 years ago

Having started a one node t1.micro cluster in region us-west-2a and ami-0a6afe3a I get the following output from running $ starcluster listinstances:

$ starcluster listinstances
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
id: i-00313535
dns_name: ec2-54-218-89-230.us-west-2.compute.amazonaws.com
private_dns_name: ip-172-31-23-222.us-west-2.compute.internal
state: running
public_ip: 54.218.89.230
private_ip: 172.31.23.222
zone: us-west-2a
ami: ami-0a6afe3a
virtualization: paravirtual
type: t1.micro
groups: 
keypair: us2-test-key
uptime: 0 days, 00:01:02
tags: N/A
Total: 1

Running $ starcluster listclusters, I still get:

---------------------------------
test1 (security group: @sc-test1)
---------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A

And $ starcluster start test1 remains stuck at the following:

$ starcluster start test1
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> Using default cluster template: testcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Creating security group @sc-test1...
Reservation:r-1c8d3929
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for instances to activate...-
izharw commented 11 years ago

I'm launching the cluster with:

izharw@izharw-VirtualBox:~/Desktop/EC2$ starcluster -d -r us-west-2 start -s1 -i t1.micro test
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

2013-06-28 09:52:22,424 config.py:548 - DEBUG - Loading config
2013-06-28 09:52:22,424 config.py:119 - DEBUG - Loading file: /home/izharw/.starcluster/config
2013-06-28 09:52:22,428 config.py:303 - DEBUG - enable_experimental setting not specified. Defaulting to False
2013-06-28 09:52:22,428 config.py:303 - DEBUG - include setting not specified. Defaulting to []
2013-06-28 09:52:22,429 config.py:303 - DEBUG - web_browser setting not specified. Defaulting to None
2013-06-28 09:52:22,429 config.py:303 - DEBUG - refresh_interval setting not specified. Defaulting to 30
2013-06-28 09:52:22,429 config.py:303 - DEBUG - enable_experimental setting not specified. Defaulting to False
2013-06-28 09:52:22,429 config.py:303 - DEBUG - include setting not specified. Defaulting to []
2013-06-28 09:52:22,431 config.py:303 - DEBUG - web_browser setting not specified. Defaulting to None
2013-06-28 09:52:22,435 config.py:303 - DEBUG - refresh_interval setting not specified. Defaulting to 30
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_proxy_pass setting not specified. Defaulting to None
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_ec2_path setting not specified. Defaulting to /
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_s3_path setting not specified. Defaulting to /
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_proxy_user setting not specified. Defaulting to None
2013-06-28 09:52:22,437 config.py:303 - DEBUG - aws_is_secure setting not specified. Defaulting to True
2013-06-28 09:52:22,437 config.py:303 - DEBUG - aws_s3_host setting not specified. Defaulting to None
2013-06-28 09:52:22,437 config.py:303 - DEBUG - aws_port setting not specified. Defaulting to None
2013-06-28 09:52:22,437 config.py:303 - DEBUG - ec2_private_key setting not specified. Defaulting to None
2013-06-28 09:52:22,439 config.py:303 - DEBUG - ec2_cert setting not specified. Defaulting to None
2013-06-28 09:52:22,439 config.py:303 - DEBUG - aws_proxy setting not specified. Defaulting to None
2013-06-28 09:52:22,440 config.py:303 - DEBUG - aws_proxy_port setting not specified. Defaulting to None
2013-06-28 09:52:22,440 config.py:303 - DEBUG - ip_protocol setting not specified. Defaulting to tcp
2013-06-28 09:52:22,440 config.py:303 - DEBUG - cidr_ip setting not specified. Defaulting to 0.0.0.0/0
2013-06-28 09:52:22,441 config.py:303 - DEBUG - disable_queue setting not specified. Defaulting to False
2013-06-28 09:52:22,441 config.py:303 - DEBUG - volumes setting not specified. Defaulting to []
2013-06-28 09:52:22,441 config.py:303 - DEBUG - availability_zone setting not specified. Defaulting to None
2013-06-28 09:52:22,441 config.py:303 - DEBUG - spot_bid setting not specified. Defaulting to None
2013-06-28 09:52:22,443 config.py:303 - DEBUG - disable_cloudinit setting not specified. Defaulting to False
2013-06-28 09:52:22,444 config.py:303 - DEBUG - force_spot_master setting not specified. Defaulting to False
2013-06-28 09:52:22,444 config.py:303 - DEBUG - extends setting not specified. Defaulting to None
2013-06-28 09:52:22,444 config.py:303 - DEBUG - master_image_id setting not specified. Defaulting to None
2013-06-28 09:52:22,444 config.py:303 - DEBUG - userdata_scripts setting not specified. Defaulting to []
2013-06-28 09:52:22,445 config.py:303 - DEBUG - plugins setting not specified. Defaulting to []
2013-06-28 09:52:22,452 awsutils.py:55 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-west-2, 'port': None}
2013-06-28 09:52:22,889 awsutils.py:55 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-west-2, 'port': None}
>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
2013-06-28 09:52:24,092 cluster.py:780 - DEBUG - Userdata size in KB: 0.45
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
2013-06-28 09:52:24,095 cluster.py:1013 - DEBUG - Launching master (ami: ami-706afe40, type: t1.micro)
>>> Creating security group @sc-test...
2013-06-28 09:52:26,308 cluster.py:780 - DEBUG - Userdata size in KB: 0.45
2013-06-28 09:52:26,449 awsutils.py:350 - DEBUG - Removing ephemeral drive /dev/sdb1 from runtime block device mapping (already mapped by AMI: ami-706afe40)
Reservation:r-22883c17
>>> Waiting for cluster to come up... (updating every 30s)
2013-06-28 09:52:27,300 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:52:27,304 cluster.py:706 - DEBUG - returning self._nodes = []
>>> Waiting for instances to activate...|2013-06-28 09:52:57,497 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:52:57,497 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:53:27,677 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:53:27,680 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:53:57,859 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:53:57,863 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:54:28,005 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:54:28,016 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:54:58,172 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:54:58,173 cluster.py:706 - DEBUG - returning self._nodes = []

Then, listintances actually shows me the instance...

izharw@izharw-VirtualBox:~/Desktop/EC2$ starcluster listinstances
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

id: i-b83c388d
dns_name: ec2-54-218-96-170.us-west-2.compute.amazonaws.com
private_dns_name: ip-172-31-44-191.us-west-2.compute.internal
state: running
public_ip: 54.218.96.170
private_ip: 172.31.44.191
zone: us-west-2b
ami: ami-706afe40
virtualization: paravirtual
type: t1.micro
groups: 
keypair: chematria_oregon
uptime: 0 days, 00:03:35
tags: N/A

Total: 1
jtriley commented 11 years ago

OK, listinstances is showing that the instance doesn't belong to any security group which is strange/should be impossible. Take a look at the AWS EC2 console and see if it lists the instance's security group. Also, just so we're on the same page, are you using the latest version of the develop branch in the main repo or are you using someone else's fork of StarCluster?

cl-s commented 11 years ago

Ok, I have tested again with the EU servers so I can show you whats listed in the AWS EC2 Console for a given test run.

In this example I have run: $ starcluster start eutestcluster with the following config file:

####################################
## StarCluster Configuration File ##
####################################
[global]
# Configure the default cluster template to use when starting a cluster
# defaults to 'smallcluster' defined below. This template should be usable
# out-of-the-box provided you've configured your keypair correctly
DEFAULT_TEMPLATE=testcluster
# enable experimental features for this release
#ENABLE_EXPERIMENTAL=True
# number of seconds to wait when polling instances (default: 30s)
#REFRESH_INTERVAL=15
# specify a web browser to launch when viewing spot history plots
#WEB_BROWSER=chromium
# split the config into multiple files
#INCLUDE=~/.starcluster/aws, ~/.starcluster/keys, ~/.starcluster/vols
#############################################
## AWS Credentials and Connection Settings ##
#############################################
[aws info]
# This is the AWS credentials section (required).
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = MY ACCESS KEY
AWS_SECRET_ACCESS_KEY = MY SECRET ACCESS KEY 
# replace this with your account number
AWS_USER_ID= MY USERID
# Uncomment to specify a different Amazon AWS region  (OPTIONAL)
# (defaults to us-east-1 if not specified)
# NOTE: AMIs have to be migrated!
AWS_REGION_NAME = eu-west-1
AWS_REGION_HOST = ec2.eu-west-1.amazonaws.com
#AWS_REGION_NAME = us-west-2
#AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
# Uncomment these settings when creating an instance-store (S3) AMI (OPTIONAL)
#EC2_CERT = /path/to/your/cert-asdf0as9df092039asdfi02089.pem
#EC2_PRIVATE_KEY = /path/to/your/pk-asdfasd890f200909.pem
# Uncomment these settings to use a proxy host when connecting to AWS
#AWS_PROXY = your.proxyhost.com
#AWS_PROXY_PORT = 8080
#AWS_PROXY_USER = yourproxyuser
#AWS_PROXY_PASS = yourproxypass
###########################
## Defining EC2 Keypairs ##
###########################
# Sections starting with "key" define your keypairs. See "starcluster createkey
# --help" for instructions on how to create a new keypair. Section name should
# match your key name e.g.:
[key eu-test-key]
KEY_LOCATION=/home/remotehomes/callum/.ssh/eu-test-key.rsa
# You can of course have multiple keypair sections
# [key myotherkey]
# KEY_LOCATION=~/.ssh/myotherkey.rsa
################################
## Defining Cluster Templates ##
################################
# Sections starting with "cluster" represent a cluster template. These
# "templates" are a collection of settings that define a single cluster
# configuration and are used when creating and configuring a cluster. You can
# change which template to use when creating your cluster using the -c option
# to the start command:
#
#     $ starcluster start -c mediumcluster mycluster
#
# If a template is not specified then the template defined by DEFAULT_TEMPLATE
# in the [global] section above is used. Below is the "default" template named
# "smallcluster". You can rename it but dont forget to update the
# DEFAULT_TEMPLATE setting in the [global] section above. See the next section
# on defining multiple templates.
[cluster testcluster]
# change this to the name of one of the keypair sections defined above
KEYNAME = eu-test-key
# number of ec2 instances to launch
CLUSTER_SIZE = 1
# create the following user on the cluster
CLUSTER_USER = sgeadmin
# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash
# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-ab447bdf
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = t1.micro
# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True
# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
#MASTER_INSTANCE_TYPE = m1.small
# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
#MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c
# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
#VOLUMES = oceandata, biodata
# list of plugins to load after StarCluster's default setup routines (OPTIONAL)
# see "Configuring StarCluster Plugins" below on how to define plugin sections
#PLUGINS = myplugin, myplugin2
# list of permissions (or firewall rules) to apply to the cluster's security
# group (OPTIONAL).
#PERMISSIONS = ssh, http
# Uncomment to always create a spot cluster when creating a new cluster from
# this template. The following example will place a $0.50 bid for each spot
# request.
#SPOT_BID = 0.50

As before the start function halts at the following stage:

$ starcluster start eutestcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> Using default cluster template: testcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Creating security group @sc-eutestcluster...
Reservation:r-c1c9448b
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for instances to activate...-

And $ starcluster listclusters gives the following output:

StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
-------------------------------------------------
eutestcluster (security group: @sc-eutestcluster)
-------------------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A

And $ starcluster listinstances gives the following output:

$ starcluster listinstances
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
id: i-d606729b
dns_name: N/A
private_dns_name: ip-172-31-38-175.eu-west-1.compute.internal
state: stopped (User initiated (2013-06-25 16:40:39 GMT))
public_ip: N/A
private_ip: 172.31.38.175
zone: eu-west-1c
ami: ami-3d160149
virtualization: paravirtual
type: m1.medium
groups: 
keypair: solr-eu-instance-01
uptime: N/A
tags: name=solr-eu-instance-01
id: i-686d1b25
dns_name: ec2-54-229-59-144.eu-west-1.compute.amazonaws.com
private_dns_name: ip-172-31-17-101.eu-west-1.compute.internal
state: running
public_ip: 54.229.59.144
private_ip: 172.31.17.101
zone: eu-west-1b
ami: ami-3d160149
virtualization: paravirtual
type: t1.micro
groups: 
keypair: quorate-web-demo
uptime: 17 days, 07:10:12
tags: N/A
id: i-885102c5
dns_name: ec2-54-229-19-113.eu-west-1.compute.amazonaws.com
private_dns_name: ip-172-31-8-11.eu-west-1.compute.internal
state: running
public_ip: 54.229.19.113
private_ip: 172.31.8.11
zone: eu-west-1a
ami: ami-3d160149
virtualization: paravirtual
type: t1.micro
groups: 
keypair: starclustertest
uptime: 5 days, 06:24:26
tags: Name=StarclusterTest
id: i-a43313e9
dns_name: ec2-54-229-70-28.eu-west-1.compute.amazonaws.com
private_dns_name: ip-172-31-46-21.eu-west-1.compute.internal
state: running
public_ip: 54.229.70.28
private_ip: 172.31.46.21
zone: eu-west-1c
ami: ami-ab447bdf
virtualization: paravirtual
type: t1.micro
groups: 
keypair: eu-test-key
uptime: 0 days, 00:19:26
tags: N/A
Total: 4

In the EC2 Management Console under Security Groups I can see the following: selection_056

Finally in terms of build $ starcluster --version shows the following:

starcluster --version
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
0.9999

And I can confirm I built the code from the main branch of the repository checking out the code as follows: git clone https://github.com/jtriley/StarCluster.git and running $ sudo python setup.py install. I also get the same behaviour if I install StarCluster via sudo easy_install StarCluster

jtriley commented 11 years ago

@cl-s Sorry I meant check the EC2 management console to see what security group the instance is assigned to. You need to look at the list of instances to get this info not the list of security groups.

I can see that StarCluster is able to create the security group for you as expected but for some reason instances are not being assigned to that security group so I'm curious which group the instance is actually being assigned to.

jtriley commented 11 years ago

By the way, I'm on #starcluster on freenode IRC if you'd like to up the bandwidth.

cl-s commented 11 years ago

I have just rerun the above configuration but now with name eutestcluster1 rather than eutestcluster as before. Again I get all the same behaviour but I can now show you the instance description from the EC2 Management Console:

selection_057

I should also be on your IRC Channel now as picoutputcls if you want me to confirm/test anymore details.

cl-s commented 11 years ago

An interesting side note (which may or may not help track down the issue here):

Running the following command for me results in the same behaviour as described above:

$ starcluster createvolume --name=rshared 50 eu-west-1a --image-id=ami-6c3a2f18
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> No keypair specified, picking one from config...
>>> Using keypair: access-key-pair
>>> Creating security group @sc-volumecreator...
>>> No instance in group @sc-volumecreator for zone eu-west-1a, launching one now.
Reservation:r-4d59d007
>>> Waiting for volume host to come up... (updating every 30s)
>>> Waiting for instances to activate...\

I wonder if it's maybe something to do with the fact I am a UK based user of AWS? Or perhaps it could be to do with the way my account is configured. Either way I don't know if there is any extra information in that regard you think might help us track down the problem?

Carrying out the following command by contrast does work as it is supposed to when I reconfigure the config file to use the us-east AWS resources instead of those in the eu:

starcluster createvolume --name=rshared 10 us-east-1c
syoyo commented 11 years ago

I've faced similar problem in tokyo region(ap-northeast-1), and the problem is filter tag used in boto in listing up running instances.

Here's my fix.

https://github.com/syoyo/StarCluster/commit/4117c7d4ef57a640077ca68d0defa2fc1e7cc5df

Hope this helps.

izharw commented 11 years ago

Thanks! It works for me (us-west-2).

Izhar

On Sat, Jul 6, 2013 at 5:35 AM, Syoyo Fujita notifications@github.comwrote:

I've faced similar problem in tokyo region(ap-northeast-1), and the problem is filter tag used in boto in listing up running instances.

Here's my fix.

syoyo@4117c7dhttps://github.com/syoyo/StarCluster/commit/4117c7d4ef57a640077ca68d0defa2fc1e7cc5df

Hope this helps.

— Reply to this email directly or view it on GitHubhttps://github.com/jtriley/StarCluster/issues/265#issuecomment-20551845 .

cl-s commented 11 years ago

Thanks @syoyo, making those changes seems to solve the problem for me too.

syoyo commented 11 years ago

Cool.

Is it better making this patch as pull request?

jtriley commented 11 years ago

@syoyo @cl-s @izharw Glad that fixed it for you but that still makes no sense and I'm not merging that PR just yet. That filter is for VPC groups only which StarCluster does not yet support (although there is an active PR that I plan to merge for it).

Looking at the instance metadata from @cl-s AWS Console I can see the instance has a vpc id AND it belongs to security group @sc-eutestcluster1 which is a security group name that is not compatible with VPC.

Is there some sort of default VPC policy for your account? I'm completely baffled how these instances are being launched into a vpc group using the official development branch of StarCluster...

jtriley commented 11 years ago

BTW this 'fix' for you would almost certainly break things the same way for everyone else which is why I can't merge that as is.

jtriley commented 11 years ago

I'm thinking this is related to your account having a 'default' VPC setup:

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/default-vpc.html

syoyo commented 11 years ago

Yes, it depends how StarCluster(or boto?) handle VPC.

FYI, VPC is now default in EC2 for newer user or firstly launched instance in non-us(?) regions.

jtriley commented 11 years ago

@syoyo OK good to know. I'm looking into a better fix for this now.

jtriley commented 11 years ago

@syoyo Sorry, do you mean any new user for EC2 is now automatically signed up to VPC and the default VPC is automatically configured for them? I'm wondering if this is still VPC specific or if this is now an issue for all new EC2 users.

cl-s commented 11 years ago

I am guessing this is what @syoyo is referring to: http://aws.typepad.com/aws/2013/03/amazon-ec2-update-virtual-private-clouds-for-everyone.html

Interestingly for me it seems to cause issues with the default configuration of StarCluster in all regions except us-east-1.

My account is only around a month or so old so I am guessing that is probably why I've encountered these issues with the default VPC configuration.

jtriley commented 11 years ago

@cl-s @syoyo Yep that makes sense. I'm working on a fix now. Need to find a way to query whether the user has a default vpc or not before constructing the filter because clearly this default VPC situation is not the same for all users (new and existing) and it seems the group-name and instance.group-name filters are mutually exclusive.

jtriley commented 11 years ago

Please test this and reopen if the patch doesn't fix the issue.

syoyo commented 11 years ago

Thanks for the fix.

Unfortunately I got still infinite loop in start operation in Tokyo region.

The problem is that cluster_group.vpc_id returns None in start operation. Ctrl-C -> then restart or terminate correctly reports vpc_id

$ starcluster start mycluster
# Infinite loop since cluster_group.vpc_id = None
^C
$ startcluster restart mycluster 
# cluster_group.vpc_id = vpc-XXXXXXXX so this operation works well.
jtriley commented 11 years ago

OK seems like an eventual consistency issue. I have another approach I want to try that should fix this issue along with #270.

jtriley commented 11 years ago

@syoyo I decided to step back and double check that the instance.group-name filter indeed does not work for EC2 classic before continuing to try to fix this other ways. Turns out the instance.group-name filter is not VPC only so your fix worked everywhere all along! My apologies for not double-checking this earlier. Committing a patch now that simply updates the group-name filter to instance.group-name everywhere which should also fix #270

syoyo commented 11 years ago

@jtriley Thank you! it fixes #265 finally. But still #270 fails...