infochimps-labs / ironfan

Chef orchestration layer -- your system diagram come to life. Provision EC2, OpenStack or Vagrant without changes to cookbooks or configuration
http://infochimps.com
Other
502 stars 102 forks source link

Bootstrap/Kick issues with realms #347

Open gwilton opened 10 years ago

gwilton commented 10 years ago

Hi,

Over this past week i been trying to integrated with Ironfan 6.0.x. I been running into numerous problems along the way. At the moment I can successfully launch instances to EC2 but every other functionality seems to be failing. Here are the problems I am having, maybe someone on here can help.

Dependencies

So my Gemfile looks like this... The ironfan_homebase seems to be stuck on ironfan4, did what I could here to integrate with ironfan6.

source "http://rubygems.org"

#
# Chef
#
gem 'ironfan',         "= 6.0.6"

gem 'berkshelf',       "= 1.4.2"     # FIXME: pins chef to the 10.16 branch.
gem 'faraday', "= 0.8.9"             # Latest faraday version was resulting in problems with ridley had to pin it down
gem 'parseconfig'

gem 'spiceweasel'
gem 'chef-rewind'
gem "knife-ec2", "~> 0.6.4"

#
# Test drivers
#

group :test do
  gem 'rake'
  gem 'bundler',       "~> 1"
  gem 'rspec',         "~> 2.5"
  gem 'redcarpet',   "~> 2"
  gem 'cucumber',      "~> 1.1"
  gem 'foodcritic'
end

#
# Development
#

group :development do
  gem 'yard',          "~> 0.6"
  gem 'jeweler'

  gem 'ruby_gntp'

  # FIXME: Commented out until guard-chef stops breaking bundle update
  # gem 'guard',         "~> 1"
  # gem 'guard-process', "~> 1"
  # gem 'guard-chef',    :git => 'git://github.com/infochimps-forks/guard-chef.git'
  # gem 'guard-cucumber'
end

group :support do
  gem 'pry'  # useful in debugging
end

Clusters & Realms Definition

It seems like defining a cluster under ironfan_homebase/clusters is going away and now is all being done in ironfan_homebase/realms. I was able to put something together with the following documentation (https://github.com/infochimps-labs/ironfan/blob/master/NOTES-REALM.md).

I created a realm 'ironfan_homebase/realms/q1.rb' That looks like this.

Ironfan.realm(:q1) do  

  cluster :control do
    cloud(:ec2) do
      permanent           false
      availability_zones ['us-east-1a']
      flavor              'm1.large'
      backing             'ebs'
      image_name          'ironfan-precise'
      bootstrap_distro    'ubuntu12.04-ironfan'
      chef_client_script  'client.rb'
      mount_ephemerals
    end

    environment           :qa

    role                  :systemwide,    :first
    cloud(:ec2).security_group :systemwide
    role                  :ssh
    cloud(:ec2).security_group(:ssh).authorize_port_range 22..22
    role                  :set_hostname

    recipe                'log_integration::logrotate'

    role                  :volumes
    role                  :package_set,   :last
    role                  :minidash,      :last

    role                  :org_base
    role                  :org_users
    role                  :org_final,     :last

    role                  :tuning,        :last

    facet :worker do
      instances           1
    end

    facet :app do
      instances           1
      cloud(:ec2).flavor        'm1.large'
      recipe              'volumes::build_raid', :first

      # FIXME: This works around https://github.com/infochimps-labs/ironfan/issues/209
      cloud(:ec2).mount_ephemerals(:mountable => false, :in_raid => "md0")
      raid_group(:md0) do
        device            '/dev/md0'
        mount_point       '/raid0'
        level             0
        sub_volumes       [:ephemeral0, :ephemeral1]
      end
    end

    cluster_role.override_attributes({
      })
  end

end

Launching EC2 instance

I am able to launch the instance in EC2 without a problem. But the moment I try to bootstrap the instance I get an ERROR. I try playing around with many different cluster/realms definitions. The only time I get a different result is if the cluster is named "sandbox" (I know, very strange see below).

$ knife cluster launch q1-control-worker-0
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in q1 realm, control cluster, worker facet, servers 0
  control:          Loading chef
  control:          Loading ec2
  control:          Reconciling DSL and provider information
  +---------------------+-------+-------------+----------+------------+-----+-------+
  | Name                | Chef? | State       | Flavor   | AZ         | Env | Realm |
  +---------------------+-------+-------------+----------+------------+-----+-------+
  | q1-control-worker-0 | no    | not running | m1.large | us-east-1a | qa  | q1    |
  +---------------------+-------+-------------+----------+------------+-----+-------+
Syncing to chef
Preparing shared resources:
  control:          Loading chef
  control:          Loading ec2
  control:          Reconciling DSL and provider information
Loaded information for 2 computer(s) in cluster control
  q1-control:       creating key pair for q1-control
  control:          creating security groups
  q1-control:         creating q1-control security group
  q1-control-app:     creating q1-control-app security group
  q1-control-worker:      creating q1-control-worker security group
  control:          ensuring security group permissions
  q1-control:         ensuring access from q1-control to q1-control
  ssh:                ensuring tcp access from 0.0.0.0/0 to 22..22

Launching computers
  +---------------------+-------+-------------+----------+------------+-----+-------+
  | Name                | Chef? | State       | Flavor   | AZ         | Env | Realm |
  +---------------------+-------+-------------+----------+------------+-----+-------+
  | q1-control-worker-0 | no    | not running | m1.large | us-east-1a | qa  | q1    |
  +---------------------+-------+-------------+----------+------------+-----+-------+
  q1-control-worker-0:  creating cloud machine
  i-b0bc4891:       waiting for machine to be ready
  i-b0bc4891:       tagging with {"cluster"=>"control", "facet"=>"worker", "index"=>0, "name"=>"q1-control-worker-0", "Name"=>"q1-control-worker-0", "creator"=>"wilton"}
  vol-94fab2e2:     tagging with {"cluster"=>"control", "facet"=>"worker", "index"=>0, "name"=>"q1-control-worker-0-root", "Name"=>"q1-control-worker-0-root", "creator"=>"wilton", "server"=>"q1-control-worker-0", "mount_point"=>"/", "device"=>"/dev/sda1"}
  q1-control-worker-0:  setting termination flag false
  q1-control-worker-0:  syncing EBS volumes
  q1-control-worker-0:  trying ssh
All computers launched correctly
Applying aggregations:
  control:          Loading chef
  control:          Loading ec2
  control:          Reconciling DSL and provider information
Loaded information for 2 computer(s) in cluster control
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP     | Private IP    | Created On |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+
  | q1-control-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-b0bc4891 | 50.19.196.221 | 10.225.25.224 | 2014-03-22 |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+

Bootstrapping instance

When cluster name is "control"

$ knife cluster bootstrap q1-control-worker-0
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in q1 realm, control cluster, worker facet, servers 0
  control:          Loading chef
  control:          Loading ec2
  control:          Reconciling DSL and provider information
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+-----------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP     | Private IP    | Created On | relevant? |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+-----------+
  | q1-control-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-b0bc4891 | 50.19.196.221 | 10.225.25.224 | 2014-03-22 | true      |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+-----------+
Preparing shared resources:
  control:          Loading chef
  control:          Loading ec2
  control:          Reconciling DSL and provider information
Loaded information for 2 computer(s) in cluster control
  control:          ensuring security group permissions
  q1-control:         ensuring access from q1-control to q1-control
  ssh:                ensuring tcp access from 0.0.0.0/0 to 22..22

Running bootstrap on q1-control-worker-0...

Bootstrapping the node redoes its initial setup -- only do this on an aborted launch.
Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm) Yes

WARNING: Error running #<Ironfan::Broker::Computer(server=#<Ironfan::Dsl::Server(name="0", components=c{  }, run_list_items=c{ role[systemwide], role[ssh], role[nfs_client], role[set_hostname], log_integration::logrotate, role[volumes], role[package_set], role[minidash], role[org_base], role[org_users], role[org_final], role[tuning], role[q1-control-cluster], role[q1-control-worker-facet] }, clouds=c{ ec2 }, volumes=c{  }, security_groups=c{  }, environment=:qa, realm_name="q1", cluster_role=#<Ironfan::Dsl::Role>, facet_role=#<Ironfan::Dsl::Role>, cluster_names={:control=>:control}, cluster_name="control", facet_name="worker")>, resources=c{ client, node, machine, security_group__systemwide, security_group__ssh, security_group__nfs_client }, drives=c{ root, ephemeral0, ephemeral1 }, providers=c{ chef, iaas })>:
WARNING: undefined method `ssh_identity_file' for #<Ironfan::Dsl::Ec2:0x007f9cd4733958>
ERROR: undefined method `ssh_identity_file' for #<Ironfan::Dsl::Ec2:0x007f9cd4733958> (NoMethodError)
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan/broker/computer.rb:209:in `ssh_identity_file'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/chef/knife/ironfan_knife_common.rb:171:in `bootstrapper'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/chef/knife/ironfan_knife_common.rb:181:in `run_bootstrap'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/chef/knife/cluster_bootstrap.rb:62:in `block in perform_execution'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan.rb:114:in `block (3 levels) in parallel'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan.rb:123:in `safely'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan.rb:113:in `block (2 levels) in parallel'
ERROR: /Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan/broker/computer.rb:209:in `ssh_identity_file'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/chef/knife/ironfan_knife_common.rb:171:in `bootstrapper'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/chef/knife/ironfan_knife_common.rb:181:in `run_bootstrap'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/chef/knife/cluster_bootstrap.rb:62:in `block in perform_execution'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan.rb:114:in `block (3 levels) in parallel'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan.rb:123:in `safely'
/Users/wilton/.rvm/gems/ruby-1.9.3-p392@ironchef/gems/ironfan-6.0.6/lib/ironfan.rb:113:in `block (2 levels) in parallel'
Applying aggregations:
  control:          Loading chef
  control:          Loading ec2
  control:          Reconciling DSL and provider information
Loaded information for 2 computer(s) in cluster control

Finished! Current state:
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP     | Private IP    | Created On |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+
  | q1-control-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-b0bc4891 | 50.19.196.221 | 10.225.25.224 | 2014-03-22 |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+---------------+---------------+------------+

When cluster name is "sandbox"

$ knife cluster bootstrap q1-sandbox-worker-0
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in q1 realm, sandbox cluster, worker facet, servers 0
  sandbox:          Loading chef
  sandbox:          Loading ec2
  sandbox:          Reconciling DSL and provider information
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+-----------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP   | Private IP   | Created On | relevant? |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+-----------+
  | q1-sandbox-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-56b64277 | 54.82.90.20 | 10.96.197.99 | 2014-03-22 | true      |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+-----------+
Preparing shared resources:
  sandbox:          Loading chef
  sandbox:          Loading ec2
  sandbox:          Reconciling DSL and provider information
Loaded information for 2 computer(s) in cluster sandbox
  sandbox:          ensuring security group permissions
  q1-sandbox:         ensuring access from q1-sandbox to q1-sandbox
  ssh:                ensuring tcp access from 0.0.0.0/0 to 22..22

Running bootstrap on q1-sandbox-worker-0...

Bootstrapping the node redoes its initial setup -- only do this on an aborted launch.
Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm) Yes

  q1-sandbox-worker-0:  Running bootstrap
Bootstrapping Chef on ec2-54-82-90-20.compute-1.amazonaws.com
Failed to authenticate ubuntu - trying password auth

When cluster name is "sandbox"

When the cluster has a name of "sandbox", I get prompted for a password. So it seems that the ssh key is not properly being set. So i use the -i option and provide the generated key under "ironfan_homebase/knife/credentials/ec2_keys/" to get by this. The system begins to bootstrap but never completes successfully. I get this console prompt window asking me to enter the chef_server_url which I can't. I do see that chef-client is installed on the instance.

$ knife cluster bootstrap q1-sandbox-worker-0 -i knife/credentials/ec2_keys/q1-sandbox.pem 
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in q1 realm, sandbox cluster, worker facet, servers 0
  sandbox:          Loading chef
  sandbox:          Loading ec2
  sandbox:          Reconciling DSL and provider information
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+-----------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP   | Private IP   | Created On | relevant? |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+-----------+
  | q1-sandbox-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-3dae5a1c | 54.82.77.79 | 10.65.144.70 | 2014-03-22 | true      |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+-----------+
Preparing shared resources:
  sandbox:          Loading chef
  sandbox:          Loading ec2
  sandbox:          Reconciling DSL and provider information
Loaded information for 2 computer(s) in cluster sandbox
  sandbox:          ensuring security group permissions
  q1-sandbox:         ensuring access from q1-sandbox to q1-sandbox
  ssh:                ensuring tcp access from 0.0.0.0/0 to 22..22

Running bootstrap on q1-sandbox-worker-0...

Bootstrapping the node redoes its initial setup -- only do this on an aborted launch.
Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm) Yes

  q1-sandbox-worker-0:  Running bootstrap
Bootstrapping Chef on ec2-54-82-77-79.compute-1.amazonaws.com
ec2-54-82-77-79.compute-1.amazonaws.com deb http://apt.opscode.com/ precise-0.10 main
ec2-54-82-77-79.compute-1.amazonaws.com gpg: directory `/local/home/ubuntu/.gnupg' created
ec2-54-82-77-79.compute-1.amazonaws.com gpg: new configuration file `/local/home/ubuntu/.gnupg/gpg.conf' created
ec2-54-82-77-79.compute-1.amazonaws.com gpg: 
ec2-54-82-77-79.compute-1.amazonaws.com WARNING: options in `/local/home/ubuntu/.gnupg/gpg.conf' are not yet active during this run
ec2-54-82-77-79.compute-1.amazonaws.com gpg: keyring `/local/home/ubuntu/.gnupg/secring.gpg' created
ec2-54-82-77-79.compute-1.amazonaws.com gpg: keyring `/local/home/ubuntu/.gnupg/pubring.gpg' created
ec2-54-82-77-79.compute-1.amazonaws.com gpg: requesting key 83EF826A from hkp server keys.gnupg.net
ec2-54-82-77-79.compute-1.amazonaws.com gpg: /local/home/ubuntu/.gnupg/trustdb.gpg: trustdb created
ec2-54-82-77-79.compute-1.amazonaws.com gpg: key 83EF826A: public key "Opscode Packages <packages@opscode.com>" imported
ec2-54-82-77-79.compute-1.amazonaws.com gpg: Total number processed: 1
ec2-54-82-77-79.compute-1.amazonaws.com gpg: 
ec2-54-82-77-79.compute-1.amazonaws.com               imported: 1
ec2-54-82-77-79.compute-1.amazonaws.com Sat Mar 22 17:17:26 UTC 2014 
ec2-54-82-77-79.compute-1.amazonaws.com 
ec2-54-82-77-79.compute-1.amazonaws.com **** 
ec2-54-82-77-79.compute-1.amazonaws.com **** apt update:
ec2-54-82-77-79.compute-1.amazonaws.com ****
ec2-54-82-77-79.compute-1.amazonaws.com Preconfiguring packages ...
ec2-54-82-77-79.compute-1.amazonaws.com 

ec2-54-82-77-79.compute-1.amazonaws.com Package configuration                   

      ┌───────────────────────┤ Configuring chef ├───────────────────────┐      
      │  This is the full URI that clients will use to connect to the    │      
      │  server.                                                         │      
      │  .                                                               │      
      │  This will be used in /etc/chef/client.rb as 'chef_server_url'.  │      
      │                                                                  │      
      │ URL of Chef Server (e.g., http://chef.example.com:4000):         │      
      │                                                                  │      
      │ ________________________________________________________________ │      
      │                                                                  │      
      │                              <Ok>                                │      
      │                                                                  │      
      └──────────────────────────────────────────────────────────────────┘      

knife cluster kick

So lets assume some how the bootstrap above at least installed the chef-client. Lets try to do a kick. I follow the same method as bootstrapping and provide the -i option. That fails and i am not sure why is trying to use my local username of 'wilton' to kick, that should be ubuntu. Ignoring that and going to provide the -x option.

$ knife cluster kick q1-sandbox-worker-0 -i knife/credentials/ec2_keys/q1-sandbox.pem 
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in q1 realm, sandbox cluster, worker facet, servers 0
  sandbox:          Loading chef
  sandbox:          Loading ec2
  sandbox:          Reconciling DSL and provider information
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP   | Private IP   | Created On |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+
  | q1-sandbox-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-3dae5a1c | 54.82.77.79 | 10.65.144.70 | 2014-03-22 |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+
WARNING: Failed to connect to  -- Net::SSH::AuthenticationFailed: Authentication failed for user wilton@ec2-54-82-77-79.compute-1.amazonaws.com@ec2-54-82-77-79.compute-1.amazonaws.com
$ knife cluster kick q1-sandbox-worker-0 -i knife/credentials/ec2_keys/q1-sandbox.pem -x ubuntu
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in q1 realm, sandbox cluster, worker facet, servers 0
  sandbox:          Loading chef
  sandbox:          Loading ec2
  sandbox:          Reconciling DSL and provider information
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+
  | Name                | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP   | Private IP   | Created On |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+
  | q1-sandbox-worker-0 | yes   | running | m1.large | us-east-1a | qa  | q1    | i-3dae5a1c | 54.82.77.79 | 10.65.144.70 | 2014-03-22 |
  +---------------------+-------+---------+----------+------------+-----+-------+------------+-------------+--------------+------------+
q1-sandbox-worker-0 ****
q1-sandbox-worker-0 
q1-sandbox-worker-0 starting chef-client-nonce service
q1-sandbox-worker-0 
q1-sandbox-worker-0 ****
q1-sandbox-worker-0 
q1-sandbox-worker-0 [2014-03-22T17:31:54+00:00] INFO: *** Chef 10.16.4 ***
q1-sandbox-worker-0 [2014-03-22T17:31:55+00:00] INFO: Run List is [role[systemwide], role[ssh], role[set_hostname], recipe[log_integration::logrotate], role[volumes], role[org_base], role[org_users], role[package_set], role[org_final], role[tuning], role[q1-sandbox-cluster], role[q1-sandbox-worker-facet]]
q1-sandbox-worker-0 [2014-03-22T17:31:55+00:00] INFO: Run List expands to [apt::update_immediately, build-essential, motd, ntp, route53::default, route53::set_hostname, log_integration::logrotate, xfs, volumes::mount, volumes::resize, package_set, tuning::default]
q1-sandbox-worker-0 [2014-03-22T17:31:55+00:00] INFO: HTTP Request Returned 404 Not Found: No routes match the request: //reports/nodes/q1-sandbox-worker-0/runs
q1-sandbox-worker-0 [2014-03-22T17:31:55+00:00] INFO: Starting Chef Run for q1-sandbox-worker-0
q1-sandbox-worker-0 [2014-03-22T17:31:55+00:00] INFO: Running start handlers
q1-sandbox-worker-0 [2014-03-22T17:31:55+00:00] INFO: Start handlers complete.
q1-sandbox-worker-0 [2014-03-22T17:31:56+00:00] INFO: Loading cookbooks [apt, build-essential, log_integration, motd, ntp, package_set, route53, silverware, tuning, volumes, xfs]

knife cluster ssh

Same issue trying to use cluster ssh. I have to provide both -i -x option to get a successful authentication.

$ knife cluster ssh q1-sandbox-worker-0 -i knife/credentials/ec2_keys/q1-sandbox.pem uptime no realm-specific Gemfile found. using default Gemfile. Inventorying servers in q1 realm, sandbox cluster, worker facet, servers 0 sandbox: Loading chef sandbox: Loading ec2 sandbox: Reconciling DSL and provider information WARNING: Failed to connect to -- Net::SSH::AuthenticationFailed: Authentication failed for user wilton@ec2-54-82-77-79.compute-1.amazonaws.com@ec2-54-82-77-79.compute-1.amazonaws.com

$ knife cluster ssh q1-sandbox-worker-0 -i knife/credentials/ec2_keys/q1-sandbox.pem -x ubuntu uptime no realm-specific Gemfile found. using default Gemfile. Inventorying servers in q1 realm, sandbox cluster, worker facet, servers 0 sandbox: Loading chef sandbox: Loading ec2 sandbox: Reconciling DSL and provider information q1-sandbox-worker-0 17:34:05 up 18 min, 1 user, load average: 0.15, 0.15, 0.14

Conclusion

I know this is a lot of information. I been trying to get this working for a while. However, i am not too familiar with the Ironfan internal code to dig to deep, but i will give it a shot this weekend. I am at the point that if i can't figure this out i will have to go back to using ironfan3/4 with chef 0.10.x.

I am thinking there are a few issues here...

  1. Not sure why i can't bootstrap a facet unless the cluster name is "sandbox". This is really weird.
  2. I think #1 has to do with the bigger problem revolving around ssh keys for the realms-cluster. I see a lot of commits over the last few days around keys in general.
  3. Once I get the systems to bootstrap I think there is an issue with the "ubuntu12.04-ironfan" bootstrap script, this should not prompt you for chef_server_url.

These will be the three thing i will be looking into this weekend and try to fix. I would really appreciate some help here. Thank you!

aseever commented 10 years ago

Thanks for the details! Sorry it hasn't gone smoothly this week. It might be a few days before we have a solution but we'll dig in to see if we can ascertain what might have gone wrong.

gwilton commented 10 years ago

Hey man, thanks for the quick follow-up. I enjoy using Ironfan and will do what I can to help. If I figure anything out I will let you know. If anyone is running a working homebase with Ironfan6 it would be great to see how the Gemfile and knife.rb looks. What ruby version is being used, maybe even a gem list. An example of a fully working realm configuration would help. Thanks.

P.S. Every knife cluster command says... "no realm-specific Gemfile found. using default Gemfile." What does this mean?

gwilton commented 10 years ago

I was able to gather more information. At least for the bootstrapping issue. The reason I am able to bootstrap a realm named "sandbox" and nothing else works, is because i have a EC2 Key Pair with the name of "sandbox". You will find in the resource data object below that during the bootstrap process the realm looks for a identity_file with name of #{cluster}.pem when it should be #{realm}-{cluster}.pem. If #{cluster}.pem happens to exist, you will not get an ERROR but you still have to provide the proper identity_file and username with the -i -x option.

$ knife cluster bootstrap qa-sandbox-app-0
no realm-specific Gemfile found. using default Gemfile.
Inventorying servers in qa realm, sandbox cluster, app facet, servers 0
  sandbox:           Loading chef
  sandbox:           Loading ec2
  sandbox:           Reconciling DSL and provider information
  +------------------+-------+---------+----------+------------+-----+-------+------------+--------------+---------------+------------+-----------+
  | Name             | Chef? | State   | Flavor   | AZ         | Env | Realm | MachineID  | Public IP    | Private IP    | Created On | relevant? |
  +------------------+-------+---------+----------+------------+-----+-------+------------+--------------+---------------+------------+-----------+
  | qa-sandbox-app-0 | yes   | running | m1.large | us-east-1a | qa  | qa    | i-7a2fd85b | 54.80.213.11 | 10.214.21.194 | 2014-03-23 | true      |
  +------------------+-------+---------+----------+------------+-----+-------+------------+--------------+---------------+------------+-----------+
Preparing shared resources:
  sandbox:           Loading chef
  sandbox:           Loading ec2
  sandbox:           Reconciling DSL and provider information
Loaded information for 3 computer(s) in cluster sandbox
  sandbox:           ensuring security group permissions
  qa-sandbox:          ensuring access from qa-sandbox to qa-sandbox
  ssh:                 ensuring tcp access from 0.0.0.0/0 to 22..22

Running bootstrap on qa-sandbox-app-0...

Bootstrapping the node redoes its initial setup -- only do this on an aborted launch.
Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm) Yes

  qa-sandbox-app-0:    Running bootstrap
Bootstrapping Chef on ec2-54-80-213-11.compute-1.amazonaws.com
Failed to authenticate ubuntu - trying password auth
Enter your password:

WARNING: Error running [#<Ironfan::Broker::Computer(server=#<Ironfan::Dsl::Server(name="0", components=c{ }, run_list_items=c{ role[systemwide], role[ssh], role[set_hostname], role[volumes], role[package_set], role[org_base], role[org_users], role[org_final], role[tuning], volumes::build_raid, role[app], role[qa-sandbox-cluster], role[qa-sandbox-app-facet] }, clouds=c{ ec2 }, volumes=c{ ephemeral0, ephemeral1, md0 }, security_groups=c{ }, environment=:qa, realm_name="qa", cluster_role=#, facet_role=#Ironfan::Dsl::Role, cluster_names={:sandbox=>:sandbox}, cluster_name="sandbox", facet_name="app")>, resources=c{ client, node, machine, keypair, security_groupsystemwide, security_group__ssh }, drives=c{ ephemeral0, ephemeral1, md0, root }, providers=c{ chef, iaas })>, {:ssh_user=>"ubuntu", :distro=>"ubuntu12.04-ironchef", :template_file=>false, :run_list=>["role[systemwide]", "volumes::build_raid", "role[ssh]", "role[set_hostname]", "role[volumes]", "role[org_base]", "role[org_users]", "role[app]", "role[package_set]", "role[org_final]", "role[tuning]", "role[qa-sandbox-cluster]", "role[qa-sandbox-app-facet]"], :first_boot_attributes=>{}, :host_key_verify=>true, :verbosity=>0, :color=>true, :editor=>nil, :format=>"summary", :bootstrap_runs_chef_client=>true, :cloud=>true, :dry_run=>false, :config_file=>"/Users/wilton/Documents/github/ironfan_homebase/.chef/knife.rb", :computer=>#<Ironfan::Broker::Computer(server=#<Ironfan::Dsl::Server(name="0", components=c{ }, run_list_items=c{ role[systemwide], role[ssh], role[set_hostname], role[volumes], role[package_set], role[org_base], role[org_users], role[org_final], role[tuning], volumes::build_raid, role[app], role[qa-sandbox-cluster], role[qa-sandbox-app-facet] }, clouds=c{ ec2 }, volumes=c{ ephemeral0, ephemeral1, md0 }, security_groups=c{ }, environment=:qa, realm_name="qa", cluster_role=#, facet_role=#Ironfan::Dsl::Role, cluster_names={:sandbox=>:sandbox}, cluster_name="sandbox", facet_name="app")>, resources=c{ client, node, machine, keypair, security_groupsystemwide, security_group__ssh }, drives=c{ ephemeral0, ephemeral1, md0, root }, providers=c{ chef, iaas })>, :server=>#<Ironfan::Dsl::Server(name="0", components=c{ }, run_list_items=c{ role[systemwide], role[ssh], role[set_hostname], role[volumes], role[package_set], role[org_base], role[org_users], role[org_final], role[tuning], volumes::build_raid, role[app], role[qa-sandbox-cluster], role[qa-sandbox-app-facet] }, clouds=c{ ec2 }, volumes=c{ ephemeral0, ephemeral1, md0 }, security_groups=c{ }, environment=:qa, realm_name="qa", cluster_role=#, facet_role=#Ironfan::Dsl::Role, cluster_names={:sandbox=>:sandbox}, cluster_name="sandbox", facet_name="app")>, :attribute=>nil, :identity_file=>"/Users/wilton/Documents/github/ironfan_homebase/knife/credentials/ec2_keys/sandbox.pem", :use_sudo=>true, :chef_node_name=>"qa-sandbox-app-0", :client_key=>