Closed cl-s closed 11 years ago
Please clarify what makes it 'clear' that they're not configured properly.
Also, I just successfully started a 12.04 cluster in eu-west-1 with the following command:
$ starcluster -r eu-west-1 start -s 1 -i t1.micro -n ami-6c3a2f18 -k myeukey eutest
I'll try to permanently set the region in the config and see if I can reproduce your issue.
I just updated my config to use the eu-west-1 region by default and I was able to start a 1-node t1.micro cluster using 12.04 AMI ami-6c3a2f18 with a simple start command:
$ starcluster start eutest2
At this point I'm interested to hear more details about your findings in AWS console...
When attempting to create a new cluster using the configuration file I pasted in my first post starcluster start halts at the line:
>>> Waiting for instances to activate...\
At this stage I can see in the EC2 Manage window that the cluster images have been created thusly:
But when one logs into the machines it is clear that they are not configured properly as commands like qstat
do not work.
So far I have been trying to run all nodes as node instance type node_instance_type = m1.small
rather than the micro instances you have been trying. So I have also tried switching to using t1.micro instances instead but the problem still seems to persist. The problem also seems to persist when I only have one node in the cluster.
@cl-s What does listclusters say?
$ starcluster listclusters
I'm having the same problem with us-west-{1,2}. See my post to the mailing list http://star.mit.edu/cluster/mlarchives/1754.html
When the start function stalls $ starcluster listclusters
shows the following:
-------------------------------------
eutest (security group: @sc-eutest)
-------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
@izharw I can confirm that I also seem to have the same problem when attempting to start clusters in the us-west-1 and us-west-2 regions.
I have tested in the eu-west-1 region with the following AMIs:
32bit Images:
-------------
ami-06674b43 us-west-1 starcluster-base-ubuntu-12.04-x86 (EBS)
64bit Images:
--------------
ami-02674b47 us-west-1 starcluster-base-ubuntu-12.04-x86_64 (EBS)
and in us-west-2 with the following AMIs:
32bit Images:
-------------
ami-0a6afe3a us-west-2 starcluster-base-ubuntu-12.04-x86 (EBS)
64bit Images:
--------------
ami-706afe40 us-west-2 starcluster-base-ubuntu-12.04-x86_64 (EBS)
Again the instances show up as created in the Amazon EC2 Management Console yet the process stalls at >>> Waiting for instances to activate...
and $ starcluster listclusters
shows the following:
-------------------------------------------
uswesttest (security group: @sc-uswesttest)
-------------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
This is really strange. What does listinstances report?
$ starcluster listinstances
Having started a one node t1.micro
cluster in region us-west-2a
and ami-0a6afe3a
I get the following output from running $ starcluster listinstances
:
$ starcluster listinstances
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
id: i-00313535
dns_name: ec2-54-218-89-230.us-west-2.compute.amazonaws.com
private_dns_name: ip-172-31-23-222.us-west-2.compute.internal
state: running
public_ip: 54.218.89.230
private_ip: 172.31.23.222
zone: us-west-2a
ami: ami-0a6afe3a
virtualization: paravirtual
type: t1.micro
groups:
keypair: us2-test-key
uptime: 0 days, 00:01:02
tags: N/A
Total: 1
Running $ starcluster listclusters
, I still get:
---------------------------------
test1 (security group: @sc-test1)
---------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
And $ starcluster start test1
remains stuck at the following:
$ starcluster start test1
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> Using default cluster template: testcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Creating security group @sc-test1...
Reservation:r-1c8d3929
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for instances to activate...-
I'm launching the cluster with:
izharw@izharw-VirtualBox:~/Desktop/EC2$ starcluster -d -r us-west-2 start -s1 -i t1.micro test
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
2013-06-28 09:52:22,424 config.py:548 - DEBUG - Loading config
2013-06-28 09:52:22,424 config.py:119 - DEBUG - Loading file: /home/izharw/.starcluster/config
2013-06-28 09:52:22,428 config.py:303 - DEBUG - enable_experimental setting not specified. Defaulting to False
2013-06-28 09:52:22,428 config.py:303 - DEBUG - include setting not specified. Defaulting to []
2013-06-28 09:52:22,429 config.py:303 - DEBUG - web_browser setting not specified. Defaulting to None
2013-06-28 09:52:22,429 config.py:303 - DEBUG - refresh_interval setting not specified. Defaulting to 30
2013-06-28 09:52:22,429 config.py:303 - DEBUG - enable_experimental setting not specified. Defaulting to False
2013-06-28 09:52:22,429 config.py:303 - DEBUG - include setting not specified. Defaulting to []
2013-06-28 09:52:22,431 config.py:303 - DEBUG - web_browser setting not specified. Defaulting to None
2013-06-28 09:52:22,435 config.py:303 - DEBUG - refresh_interval setting not specified. Defaulting to 30
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_proxy_pass setting not specified. Defaulting to None
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_ec2_path setting not specified. Defaulting to /
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_s3_path setting not specified. Defaulting to /
2013-06-28 09:52:22,436 config.py:303 - DEBUG - aws_proxy_user setting not specified. Defaulting to None
2013-06-28 09:52:22,437 config.py:303 - DEBUG - aws_is_secure setting not specified. Defaulting to True
2013-06-28 09:52:22,437 config.py:303 - DEBUG - aws_s3_host setting not specified. Defaulting to None
2013-06-28 09:52:22,437 config.py:303 - DEBUG - aws_port setting not specified. Defaulting to None
2013-06-28 09:52:22,437 config.py:303 - DEBUG - ec2_private_key setting not specified. Defaulting to None
2013-06-28 09:52:22,439 config.py:303 - DEBUG - ec2_cert setting not specified. Defaulting to None
2013-06-28 09:52:22,439 config.py:303 - DEBUG - aws_proxy setting not specified. Defaulting to None
2013-06-28 09:52:22,440 config.py:303 - DEBUG - aws_proxy_port setting not specified. Defaulting to None
2013-06-28 09:52:22,440 config.py:303 - DEBUG - ip_protocol setting not specified. Defaulting to tcp
2013-06-28 09:52:22,440 config.py:303 - DEBUG - cidr_ip setting not specified. Defaulting to 0.0.0.0/0
2013-06-28 09:52:22,441 config.py:303 - DEBUG - disable_queue setting not specified. Defaulting to False
2013-06-28 09:52:22,441 config.py:303 - DEBUG - volumes setting not specified. Defaulting to []
2013-06-28 09:52:22,441 config.py:303 - DEBUG - availability_zone setting not specified. Defaulting to None
2013-06-28 09:52:22,441 config.py:303 - DEBUG - spot_bid setting not specified. Defaulting to None
2013-06-28 09:52:22,443 config.py:303 - DEBUG - disable_cloudinit setting not specified. Defaulting to False
2013-06-28 09:52:22,444 config.py:303 - DEBUG - force_spot_master setting not specified. Defaulting to False
2013-06-28 09:52:22,444 config.py:303 - DEBUG - extends setting not specified. Defaulting to None
2013-06-28 09:52:22,444 config.py:303 - DEBUG - master_image_id setting not specified. Defaulting to None
2013-06-28 09:52:22,444 config.py:303 - DEBUG - userdata_scripts setting not specified. Defaulting to []
2013-06-28 09:52:22,445 config.py:303 - DEBUG - plugins setting not specified. Defaulting to []
2013-06-28 09:52:22,452 awsutils.py:55 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-west-2, 'port': None}
2013-06-28 09:52:22,889 awsutils.py:55 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-west-2, 'port': None}
>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
2013-06-28 09:52:24,092 cluster.py:780 - DEBUG - Userdata size in KB: 0.45
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
2013-06-28 09:52:24,095 cluster.py:1013 - DEBUG - Launching master (ami: ami-706afe40, type: t1.micro)
>>> Creating security group @sc-test...
2013-06-28 09:52:26,308 cluster.py:780 - DEBUG - Userdata size in KB: 0.45
2013-06-28 09:52:26,449 awsutils.py:350 - DEBUG - Removing ephemeral drive /dev/sdb1 from runtime block device mapping (already mapped by AMI: ami-706afe40)
Reservation:r-22883c17
>>> Waiting for cluster to come up... (updating every 30s)
2013-06-28 09:52:27,300 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:52:27,304 cluster.py:706 - DEBUG - returning self._nodes = []
>>> Waiting for instances to activate...|2013-06-28 09:52:57,497 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:52:57,497 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:53:27,677 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:53:27,680 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:53:57,859 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:53:57,863 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:54:28,005 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:54:28,016 cluster.py:706 - DEBUG - returning self._nodes = []
|2013-06-28 09:54:58,172 cluster.py:690 - DEBUG - existing nodes: {}
2013-06-28 09:54:58,173 cluster.py:706 - DEBUG - returning self._nodes = []
Then, listintances actually shows me the instance...
izharw@izharw-VirtualBox:~/Desktop/EC2$ starcluster listinstances
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
id: i-b83c388d
dns_name: ec2-54-218-96-170.us-west-2.compute.amazonaws.com
private_dns_name: ip-172-31-44-191.us-west-2.compute.internal
state: running
public_ip: 54.218.96.170
private_ip: 172.31.44.191
zone: us-west-2b
ami: ami-706afe40
virtualization: paravirtual
type: t1.micro
groups:
keypair: chematria_oregon
uptime: 0 days, 00:03:35
tags: N/A
Total: 1
OK, listinstances is showing that the instance doesn't belong to any security group which is strange/should be impossible. Take a look at the AWS EC2 console and see if it lists the instance's security group. Also, just so we're on the same page, are you using the latest version of the develop branch in the main repo or are you using someone else's fork of StarCluster?
Ok, I have tested again with the EU servers so I can show you whats listed in the AWS EC2 Console for a given test run.
In this example I have run: $ starcluster start eutestcluster
with the following config file:
####################################
## StarCluster Configuration File ##
####################################
[global]
# Configure the default cluster template to use when starting a cluster
# defaults to 'smallcluster' defined below. This template should be usable
# out-of-the-box provided you've configured your keypair correctly
DEFAULT_TEMPLATE=testcluster
# enable experimental features for this release
#ENABLE_EXPERIMENTAL=True
# number of seconds to wait when polling instances (default: 30s)
#REFRESH_INTERVAL=15
# specify a web browser to launch when viewing spot history plots
#WEB_BROWSER=chromium
# split the config into multiple files
#INCLUDE=~/.starcluster/aws, ~/.starcluster/keys, ~/.starcluster/vols
#############################################
## AWS Credentials and Connection Settings ##
#############################################
[aws info]
# This is the AWS credentials section (required).
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = MY ACCESS KEY
AWS_SECRET_ACCESS_KEY = MY SECRET ACCESS KEY
# replace this with your account number
AWS_USER_ID= MY USERID
# Uncomment to specify a different Amazon AWS region (OPTIONAL)
# (defaults to us-east-1 if not specified)
# NOTE: AMIs have to be migrated!
AWS_REGION_NAME = eu-west-1
AWS_REGION_HOST = ec2.eu-west-1.amazonaws.com
#AWS_REGION_NAME = us-west-2
#AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
# Uncomment these settings when creating an instance-store (S3) AMI (OPTIONAL)
#EC2_CERT = /path/to/your/cert-asdf0as9df092039asdfi02089.pem
#EC2_PRIVATE_KEY = /path/to/your/pk-asdfasd890f200909.pem
# Uncomment these settings to use a proxy host when connecting to AWS
#AWS_PROXY = your.proxyhost.com
#AWS_PROXY_PORT = 8080
#AWS_PROXY_USER = yourproxyuser
#AWS_PROXY_PASS = yourproxypass
###########################
## Defining EC2 Keypairs ##
###########################
# Sections starting with "key" define your keypairs. See "starcluster createkey
# --help" for instructions on how to create a new keypair. Section name should
# match your key name e.g.:
[key eu-test-key]
KEY_LOCATION=/home/remotehomes/callum/.ssh/eu-test-key.rsa
# You can of course have multiple keypair sections
# [key myotherkey]
# KEY_LOCATION=~/.ssh/myotherkey.rsa
################################
## Defining Cluster Templates ##
################################
# Sections starting with "cluster" represent a cluster template. These
# "templates" are a collection of settings that define a single cluster
# configuration and are used when creating and configuring a cluster. You can
# change which template to use when creating your cluster using the -c option
# to the start command:
#
# $ starcluster start -c mediumcluster mycluster
#
# If a template is not specified then the template defined by DEFAULT_TEMPLATE
# in the [global] section above is used. Below is the "default" template named
# "smallcluster". You can rename it but dont forget to update the
# DEFAULT_TEMPLATE setting in the [global] section above. See the next section
# on defining multiple templates.
[cluster testcluster]
# change this to the name of one of the keypair sections defined above
KEYNAME = eu-test-key
# number of ec2 instances to launch
CLUSTER_SIZE = 1
# create the following user on the cluster
CLUSTER_USER = sgeadmin
# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash
# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-ab447bdf
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = t1.micro
# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True
# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
#MASTER_INSTANCE_TYPE = m1.small
# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
#MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c
# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
#VOLUMES = oceandata, biodata
# list of plugins to load after StarCluster's default setup routines (OPTIONAL)
# see "Configuring StarCluster Plugins" below on how to define plugin sections
#PLUGINS = myplugin, myplugin2
# list of permissions (or firewall rules) to apply to the cluster's security
# group (OPTIONAL).
#PERMISSIONS = ssh, http
# Uncomment to always create a spot cluster when creating a new cluster from
# this template. The following example will place a $0.50 bid for each spot
# request.
#SPOT_BID = 0.50
As before the start function halts at the following stage:
$ starcluster start eutestcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> Using default cluster template: testcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Creating security group @sc-eutestcluster...
Reservation:r-c1c9448b
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for instances to activate...-
And $ starcluster listclusters
gives the following output:
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
-------------------------------------------------
eutestcluster (security group: @sc-eutestcluster)
-------------------------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
And $ starcluster listinstances
gives the following output:
$ starcluster listinstances
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
id: i-d606729b
dns_name: N/A
private_dns_name: ip-172-31-38-175.eu-west-1.compute.internal
state: stopped (User initiated (2013-06-25 16:40:39 GMT))
public_ip: N/A
private_ip: 172.31.38.175
zone: eu-west-1c
ami: ami-3d160149
virtualization: paravirtual
type: m1.medium
groups:
keypair: solr-eu-instance-01
uptime: N/A
tags: name=solr-eu-instance-01
id: i-686d1b25
dns_name: ec2-54-229-59-144.eu-west-1.compute.amazonaws.com
private_dns_name: ip-172-31-17-101.eu-west-1.compute.internal
state: running
public_ip: 54.229.59.144
private_ip: 172.31.17.101
zone: eu-west-1b
ami: ami-3d160149
virtualization: paravirtual
type: t1.micro
groups:
keypair: quorate-web-demo
uptime: 17 days, 07:10:12
tags: N/A
id: i-885102c5
dns_name: ec2-54-229-19-113.eu-west-1.compute.amazonaws.com
private_dns_name: ip-172-31-8-11.eu-west-1.compute.internal
state: running
public_ip: 54.229.19.113
private_ip: 172.31.8.11
zone: eu-west-1a
ami: ami-3d160149
virtualization: paravirtual
type: t1.micro
groups:
keypair: starclustertest
uptime: 5 days, 06:24:26
tags: Name=StarclusterTest
id: i-a43313e9
dns_name: ec2-54-229-70-28.eu-west-1.compute.amazonaws.com
private_dns_name: ip-172-31-46-21.eu-west-1.compute.internal
state: running
public_ip: 54.229.70.28
private_ip: 172.31.46.21
zone: eu-west-1c
ami: ami-ab447bdf
virtualization: paravirtual
type: t1.micro
groups:
keypair: eu-test-key
uptime: 0 days, 00:19:26
tags: N/A
Total: 4
In the EC2 Management Console under Security Groups I can see the following:
Finally in terms of build $ starcluster --version
shows the following:
starcluster --version
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
0.9999
And I can confirm I built the code from the main branch of the repository checking out the code as follows:
git clone https://github.com/jtriley/StarCluster.git
and running $ sudo python setup.py install
. I also get the same behaviour if I install StarCluster via sudo easy_install StarCluster
@cl-s Sorry I meant check the EC2 management console to see what security group the instance is assigned to. You need to look at the list of instances to get this info not the list of security groups.
I can see that StarCluster is able to create the security group for you as expected but for some reason instances are not being assigned to that security group so I'm curious which group the instance is actually being assigned to.
By the way, I'm on #starcluster on freenode IRC if you'd like to up the bandwidth.
I have just rerun the above configuration but now with name eutestcluster1
rather than eutestcluster
as before. Again I get all the same behaviour but I can now show you the instance description from the EC2 Management Console:
I should also be on your IRC Channel now as picoutputcls
if you want me to confirm/test anymore details.
An interesting side note (which may or may not help track down the issue here):
Running the following command for me results in the same behaviour as described above:
$ starcluster createvolume --name=rshared 50 eu-west-1a --image-id=ami-6c3a2f18
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
>>> No keypair specified, picking one from config...
>>> Using keypair: access-key-pair
>>> Creating security group @sc-volumecreator...
>>> No instance in group @sc-volumecreator for zone eu-west-1a, launching one now.
Reservation:r-4d59d007
>>> Waiting for volume host to come up... (updating every 30s)
>>> Waiting for instances to activate...\
I wonder if it's maybe something to do with the fact I am a UK based user of AWS? Or perhaps it could be to do with the way my account is configured. Either way I don't know if there is any extra information in that regard you think might help us track down the problem?
Carrying out the following command by contrast does work as it is supposed to when I reconfigure the config file to use the us-east AWS resources instead of those in the eu:
starcluster createvolume --name=rshared 10 us-east-1c
I've faced similar problem in tokyo region(ap-northeast-1), and the problem is filter tag used in boto in listing up running instances.
Here's my fix.
https://github.com/syoyo/StarCluster/commit/4117c7d4ef57a640077ca68d0defa2fc1e7cc5df
Hope this helps.
Thanks! It works for me (us-west-2).
Izhar
On Sat, Jul 6, 2013 at 5:35 AM, Syoyo Fujita notifications@github.comwrote:
I've faced similar problem in tokyo region(ap-northeast-1), and the problem is filter tag used in boto in listing up running instances.
Here's my fix.
syoyo@4117c7dhttps://github.com/syoyo/StarCluster/commit/4117c7d4ef57a640077ca68d0defa2fc1e7cc5df
Hope this helps.
— Reply to this email directly or view it on GitHubhttps://github.com/jtriley/StarCluster/issues/265#issuecomment-20551845 .
Thanks @syoyo, making those changes seems to solve the problem for me too.
Cool.
Is it better making this patch as pull request?
@syoyo @cl-s @izharw Glad that fixed it for you but that still makes no sense and I'm not merging that PR just yet. That filter is for VPC groups only which StarCluster does not yet support (although there is an active PR that I plan to merge for it).
Looking at the instance metadata from @cl-s AWS Console I can see the instance has a vpc id AND it belongs to security group @sc-eutestcluster1 which is a security group name that is not compatible with VPC.
Is there some sort of default VPC policy for your account? I'm completely baffled how these instances are being launched into a vpc group using the official development branch of StarCluster...
BTW this 'fix' for you would almost certainly break things the same way for everyone else which is why I can't merge that as is.
I'm thinking this is related to your account having a 'default' VPC setup:
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/default-vpc.html
Yes, it depends how StarCluster(or boto?) handle VPC.
FYI, VPC is now default in EC2 for newer user or firstly launched instance in non-us(?) regions.
@syoyo OK good to know. I'm looking into a better fix for this now.
@syoyo Sorry, do you mean any new user for EC2 is now automatically signed up to VPC and the default VPC is automatically configured for them? I'm wondering if this is still VPC specific or if this is now an issue for all new EC2 users.
I am guessing this is what @syoyo is referring to: http://aws.typepad.com/aws/2013/03/amazon-ec2-update-virtual-private-clouds-for-everyone.html
Interestingly for me it seems to cause issues with the default configuration of StarCluster in all regions except us-east-1
.
My account is only around a month or so old so I am guessing that is probably why I've encountered these issues with the default VPC configuration.
@cl-s @syoyo Yep that makes sense. I'm working on a fix now. Need to find a way to query whether the user has a default vpc or not before constructing the filter because clearly this default VPC situation is not the same for all users (new and existing) and it seems the group-name and instance.group-name filters are mutually exclusive.
Please test this and reopen if the patch doesn't fix the issue.
Thanks for the fix.
Unfortunately I got still infinite loop in start
operation in Tokyo region.
The problem is that cluster_group.vpc_id
returns None
in start
operation.
Ctrl-C -> then restart
or terminate
correctly reports vpc_id
$ starcluster start mycluster
# Infinite loop since cluster_group.vpc_id = None
^C
$ startcluster restart mycluster
# cluster_group.vpc_id = vpc-XXXXXXXX so this operation works well.
OK seems like an eventual consistency issue. I have another approach I want to try that should fix this issue along with #270.
@syoyo I decided to step back and double check that the instance.group-name filter indeed does not work for EC2 classic before continuing to try to fix this other ways. Turns out the instance.group-name filter is not VPC only so your fix worked everywhere all along! My apologies for not double-checking this earlier. Committing a patch now that simply updates the group-name filter to instance.group-name everywhere which should also fix #270
@jtriley Thank you! it fixes #265 finally. But still #270 fails...
When attempting to launch instances in region eu-west-1 in v0.9999 of StarCluster starcluster start hangs at the step detailed below:
On checking the EC2 Management Console it is clear that the nodes specified in the conf file are started but not properly configured.
This issue has been tested with all of the following AMIs:
The loaded config file is as follows:
When the region is changed back to the default region (with relevant modifications to AMI Image IDs, ect.) the above config file allows one to create clusters without issue.