Closed Cindia-blue closed 12 years ago
There's no chef node because the bootstrap didn't run...
Can you try running knife cluster bootstrap mysql_demo sqlsever 0
?
The error message for 'knife cluster mysql_demo sqlsever --bootstrap' is a bit confusing: you have the command in the wrong place. You should say knife cluster bootstrap mysql_demo-sqlsever-0
Also to note, you have spelt "sqlsever" without an 'r' -- is it possible you're having some conflict with 'sqlsever' and 'sqlserver' ?
if I input "knife cluster bootstrap mysql_demo sqlclsever 0" got ERROR: TypeError: can't convert nil into String
if I input "knife cluster bootstrap mysql_demo sqlsever mysql_demo-sqlsever-0" mysql_demo-sqlsever-0 is name the mysql server node. still got: WARNING: Bad interval: mysql_demo-sqlclient-0 @knife_common: display Nothing to report WARNING: No nodes to bootstrap, exiting
Yes, I mist -r if I changed the facet into sqlserver then run "knife cluster launch mysql_demo sqlserver" then input "knife cluster bootstrap mysql_demo sqlserver mysql_demo-sqlserver-0" got WARNING: Bad interval: mysql_demo-sqlserver-0 @knife_common: display Nothing to report WARNING: No nodes to bootstrap, exiting
You're still typing in too many things. It's
knife cluster COMMAND CLUSTER [FACET] [SERVER_INDEXES]
# all in mysql_demo cluster sqlserver facet
knife cluster launch mysql_demo sqlserver
knife cluster bootstrap mysql_demo sqlserver
# the first node in mysql_demo cluster sqlserver facet
knife cluster launch mysql_demo sqlserver 0
knife cluster bootstrap mysql_demo sqlserver 0
Here's what the robot thinks when you type the command you described:
knife cluster bootstrap mysql_demo sqlserver mysql_demo-sqlserver-0
knife cluster COMMAND CLUSTER FACET OOPS_WANTED_A_NUMBER
Tried again, below is the message
root@ubuntu:~# knife cluster bootstrap mysql_demo sqlserver Inventorying servers in mysql_demo cluster, sqlserver facet, all servers Hello World +------------------------+----------+------------+-----------------+--------------+------------+-------+-----------+---------+--------------+----------+ | Name | Env | AZ | Created At | Private IP | InstanceID | Chef? | relevant? | State | Public IP | Flavor | +------------------------+----------+------------+-----------------+--------------+------------+-------+-----------+---------+--------------+----------+ | mysql_demo-sqlserver-0 | _default | us-east-1d | 20120118-093845 | 10.202.54.41 | i-8297ade0 | no | true | running | 23.20.20.207 | t1.micro | +------------------------+----------+------------+-----------------+--------------+------------+-------+-----------+---------+--------------+----------+
Running bootstrap on mysql_demo-sqlserver-0...
Bootstrapping the node redoes its initial setup -- only do this on an aborted launch. Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm) Yes
ERROR: TypeError: can't convert nil into String root@ubuntu:~#
+------------------------+----------+------------+-----------------+--------------+------------+-------+-----------+---------+--------------+----------+
| Name | Env | AZ | Created At | Private IP | InstanceID | Chef? | relevant? | State | Public IP | Flavor |
+------------------------+----------+------------+-----------------+--------------+------------+-------+-----------+---------+--------------+----------+
| mysql_demo-sqlserver-0 | _default | us-east-1d | 20120118-093845 | 10.202.54.41 | i-8297ade0 | no | true | running | 23.20.20.207 | t1.micro |
+------------------------+----------+------------+-----------------+--------------+------------+-------+-----------+---------+--------------+----------+
I've noticed that the "can't convert nil into String" and similar issues come up a lot in cluster_chef, and they're difficult because they're saying "at some point I am expecting a
I re-formatted your output to point out what I think is the most important point. The "chef" column says "no" which means that there's no client registered. If you're using the code from the master branch, then this happens when the bootstrap doesn't complete IIRC. You should kill this ec2 instance, and re-try the cluster launch command again and it should get farther. In version 3 the client creation has been moved to earlier in the process.
In chef the client defines permissions, and the node defines attributes. In this case what's probably happening is that when chef tries to operate on a node, it notices that there is no node that is both part of the correct security group and has a client that is registered with the chef server, so it returns a "nil" value which then bombs out.
If you want more info with most of these, you can run
$ knife cluster -VV bootstrap mysql_demo sqlserver
and that should provide a backtrace.
Hope that helps.
-Peter
Thanks, my chef cluster is 3.0.10, install from homebase. I tried hadoop_demo by input "" then got, the wrong is with ssh, right? any suggestion?
root@ubuntu:~/chef-repo# knife cluster -VV bootstrap hadoop_demo master 0 DEBUG: Using configuration from /root/.chef/knife.rb Inventorying servers in hadoop_demo cluster, master facet, servers 0 INFO: Loading cluster /root/chef-repo/homebase/clusters/hadoop_demo.rb DEBUG: Signing the request as root DEBUG: Sending HTTP Request via GET to 172.16.234.140:4000/search/client DEBUG: Signing the request as root DEBUG: Sending HTTP Request via GET to 172.16.234.140:4000/search/node DEBUG: Using fog to catalog all servers DEBUG: Using fog to catalog all volumes DEBUG: Volume paired: root on hadoop_demo-master-0 (vol-71e8d31c @ /dev/sda1) +------------+----------+----------------------+------------+-----------------+--------------+---------------+-------+------------+--------------+-----------+---------+-------------+----------+-------------+ | Elastic IP | Env | Name | AZ | Created At | Volumes | Private IP | Chef? | InstanceID | Image | relevant? | State | SSH Key | Flavor | Public IP | +------------+----------+----------------------+------------+-----------------+--------------+---------------+-------+------------+--------------+-----------+---------+-------------+----------+-------------+ | | _default | hadoop_demo-master-0 | us-east-1d | 20120119-142106 | vol-71e8d31c | 10.205.13.217 | no | i-30424f52 | ami-fd589594 | true | running | hadoop_demo | t1.micro | 50.17.32.29 | +------------+----------+----------------------+------------+-----------------+--------------+---------------+-------+------------+--------------+-----------+---------+-------------+----------+-------------+
Running bootstrap on hadoop_demo-master-0...
Bootstrapping the node redoes its initial setup -- only do this on an aborted launch. Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm) Yes
/usr/lib/ruby/gems/1.8/gems/cluster_chef-3.0.10/lib/cluster_chef/cloud.rb:82:in join': can't convert nil into String (TypeError) from /usr/lib/ruby/gems/1.8/gems/cluster_chef-3.0.10/lib/cluster_chef/cloud.rb:82:in
ssh_identity_file'
from /usr/lib/ruby/gems/1.8/gems/cluster_chef-knife-3.0.10/lib/chef/knife/knife_common.rb:120:in bootstrapper' from /usr/lib/ruby/gems/1.8/gems/cluster_chef-knife-3.0.10/lib/chef/knife/knife_common.rb:130:in
run_bootstrap'
from /usr/lib/ruby/gems/1.8/gems/cluster_chef-knife-3.0.10/lib/chef/knife/cluster_bootstrap.rb:63:in perform_execution' from /usr/lib/ruby/gems/1.8/gems/cluster_chef-3.0.10/lib/cluster_chef/server_slice.rb:23:in
each'
from /usr/lib/ruby/gems/1.8/gems/cluster_chef-3.0.10/lib/cluster_chef/server_slice.rb:23:in each' from /usr/lib/ruby/gems/1.8/gems/cluster_chef-knife-3.0.10/lib/chef/knife/cluster_bootstrap.rb:62:in
perform_execution'
from /usr/lib/ruby/gems/1.8/gems/cluster_chef-knife-3.0.10/lib/chef/knife/generic_command.rb:56:in run' from /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife.rb:391:in
run_with_pretty_exceptions'
from /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife.rb:166:in run' from /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/application/knife.rb:128:in
run'
from /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/bin/knife:25
from /usr/bin/knife:19:in `load'
from /usr/bin/knife:19
Yes, the problem now is with ssh'ing into the server - your client installation doesn't know which ssh key to use. I've actually never quite had this work as intended for me, and in the end I got a lot of help in https://github.com/infochimps/cluster_chef/issues/95 and have a patch that works so that I can specify the ssh key directory, the ssh keypair, and the cluster name as separate properties. You can try the patch listed there on your gem, and use this in your cluster definition:
cloud do
ssh_identity_dir File.expand_path('~/.ssh/')
backing data['default_backing_store']
image_name data['default_release_flavor']
flavor data['default_instance_flavor']
availability_zones data['default_availability_zones']
bootstrap_distro data['default_bootstrap_template'] # 'ubuntu11.04-cluster_chef_knewton'
keypair data['keypair']
data['default_security_group_list'].each do |g|
security_group "#{g}"
end
end
I'm getting my values from a data bag, but you can fill in those values by hand and get the correct result.
Until making the above changes I jumped through some hoops to get this to work. Some more documentation or an example of how this works at infochimps may make this clearer
Also, I don't think this is related to your problem, but if you're using ruby 1.8 you may want to use rvm (see http://beginrescueend.com) to set up a 1.9 environment for yourself. See https://github.com/infochimps/cluster_chef/issues/80 for what I ran into - this is mostly on the target node, but I think that I ran into the same kind of issue on the launching node at some point.
Also, I found I get better outcomes with chef-0.10.6. There was some problem with 0.10.8 that I can't recall at the moment.
Thanks for these suggestions. I set SSH attributes (keypair - "knife" and dir) then launch and bootstrap again. Then help to return master node when input "knife node list" but Chef column is still "no". I found the client pem is indeed under client_keys.
Below is a log, found bootstrap fails on authorization. If I preset password for user "ubuntu" , the connection will be built but the chef still "no"and body of log looks the same.
DEBUG: Using configuration from /root/.chef/knife.rb Inventorying servers in hadoop_demo cluster, master facet, servers 0 INFO: Loading cluster /root/chef-repo/homebase/clusters/hadoop_demo.rb DEBUG: Signing the request as root DEBUG: Sending HTTP Request via GET to 172.16.234.140:4000/search/client DEBUG: Signing the request as root DEBUG: Sending HTTP Request via GET to 172.16.234.140:4000/search/node DEBUG: Using fog to catalog all servers DEBUG: Using fog to catalog all volumes DEBUG: Volume paired: root on hadoop_demo-master-0 (vol-e57e7a88 @ /dev/sda1) +------------+----------+----------------------+------------+-----------------+--------------+--------------+-------+------------+--------------+-----------+---------+---------+----------+---------------+ | Elastic IP | Env | Name | AZ | Created At | Volumes | Private IP | Chef? | InstanceID | Image | relevant? | State | SSH Key | Flavor | Public IP | +------------+----------+----------------------+------------+-----------------+--------------+--------------+-------+------------+--------------+-----------+---------+---------+----------+---------------+ | | _default | hadoop_demo-master-0 | us-east-1d | 20120120-052840 | vol-e57e7a88 | 10.204.29.71 | no | i-30222b52 | ami-fd589594 | true | running | knife | t1.micro | 107.21.173.99 | +------------+----------+----------------------+------------+-----------------+--------------+--------------+-------+------------+--------------+-----------+---------+---------+----------+---------------+
Running bootstrap on hadoop_demo-master-0...
Bootstrapping the node redoes its initial setup -- only do this on an aborted launch.
Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm)
Bootstrapping Chef on ec2-107-21-173-99.compute-1.amazonaws.com
DEBUG: Looking for bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap
DEBUG: Found bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap
DEBUG: Adding ec2-107-21-173-99.compute-1.amazonaws.com
DEBUG: establishing connection to ec2-107-21-173-99.compute-1.amazonaws.com:22
DEBUG: connection established
INFO: negotiating protocol version
DEBUG: remote is SSH-2.0-OpenSSH_5.8p1 Debian-1ubuntu3' DEBUG: local is
SSH-2.0-Ruby/Net::SSH_2.1.4 i686-linux'
DEBUG: read 840 bytes
DEBUG: received packet nr 0 type 20 len 836
INFO: got KEXINIT from server
INFO: sending KEXINIT
DEBUG: queueing packet nr 0 type 20 len 556
DEBUG: sent 560 bytes
INFO: negotiating algorithms
DEBUG: negotiated:
ubuntu' DEBUG: queueing packet nr 4 type 5 len 28 DEBUG: sent 52 bytes DEBUG: read 52 bytes DEBUG: received packet nr 4 type 6 len 28 DEBUG: trying publickey DEBUG: connecting to ssh-agent DEBUG: sending agent request 1 len 42 DEBUG: received agent packet 2 len 5 DEBUG: sending agent request 11 len 0 DEBUG: received agent packet 12 len 5 DEBUG: trying publickey (2f:33:69:e7:5a:7f:fe:33:45:bd:99:11:af:5d:82:dd) DEBUG: queueing packet nr 5 type 50 len 348 DEBUG: sent 372 bytes DEBUG: read 52 bytes DEBUG: received packet nr 5 type 51 len 28 DEBUG: allowed methods: publickey ERROR: all authorization methods failed (tried publickey) Failed to authenticate ubuntu - trying password auth DEBUG: Looking for bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap DEBUG: Found bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap Enter your password: DEBUG: Adding ec2-107-21-173-99.compute-1.amazonaws.com DEBUG: establishing connection to ec2-107-21-173-99.compute-1.amazonaws.com:22 DEBUG: connection established INFO: negotiating protocol version DEBUG: remote is
SSH-2.0-OpenSSH_5.8p1 Debian-1ubuntu3'
DEBUG: local is `SSH-2.0-Ruby/Net::SSH_2.1.4 i686-linux'
DEBUG: read 840 bytes
DEBUG: received packet nr 0 type 20 len 836
INFO: got KEXINIT from server
INFO: sending KEXINIT
DEBUG: queueing packet nr 0 type 20 len 556
DEBUG: sent 560 bytes
INFO: negotiating algorithms
DEBUG: negotiated:Finished! Current state: +------------+----------------------+----------+------------+-----------------+--------------+--------------+--------------+------------+-------+---------+---------+---------------+----------+ | Elastic IP | Name | Env | AZ | Created At | Volumes | Private IP | Image | InstanceID | Chef? | State | SSH Key | Public IP | Flavor | +------------+----------------------+----------+------------+-----------------+--------------+--------------+--------------+------------+-------+---------+---------+---------------+----------+ | | hadoop_demo-master-0 | _default | us-east-1d | 20120120-052840 | vol-e57e7a88 | 10.204.29.71 | ami-fd589594 | i-30222b52 | no | running | knife | 107.21.173.99 | t1.micro | +------------+----------------------+----------+------------+-----------------+--------------+--------------+--------------+------------+-------+---------+---------+---------------+----------+
You have a keypair in ec2 that chef cannot discover. This is what my patch fixes - it lets you have a keypair in AWS called, eg. "my_keypair", set "my_keypair" in your cloud properties, and have ~/.ssh/id_my_keypair be the private key.
From the name of the keypair - "knife" I have to ask if you've run ssh-keygen to create the keypair "knife", such that ~/.ssh/knife and ~/.ssh/knife.pub exist, and have you uploaded the ~/.ssh/id_knife.pub in the ec2 part of the AWS console?
Yes, the "knife" key I used is created by AWS so there is no public key under .ssh. For patch side, I update discovery.rb according to #95.
This time, I executed command of ssh-keygen locally and generated a keypair named "cluster " under local folder: ~/.ssh (generated two files named as cluster and cluster.pub), then import this pub from my AWS console. revised SSH attributes of keypair as "cluster" and dir as "~/.ssh" . Below is latest bootstrap log: I found cluster publickey is exactly examined this time but still failed... Is there anything wrong with my use of ssh-keygen? (input keypair name and password during the process)
DEBUG: Using configuration from /root/.chef/knife.rb Inventorying servers in hadoop_demo cluster, master facet, servers 0 INFO: Loading cluster /root/chef-repo/homebase/clusters/hadoop_demo.rb DEBUG: Signing the request as root DEBUG: Sending HTTP Request via GET to 172.16.234.142:4000/search/client DEBUG: Signing the request as root DEBUG: Sending HTTP Request via GET to 172.16.234.142:4000/search/node DEBUG: Using fog to catalog all servers DEBUG: Using fog to catalog all volumes DEBUG: Volume paired: root on hadoop_demo-master-0 (vol-afdfcac2 @ /dev/sda1) +------------+----------+----------------------+------------+-----------------+--------------+---------------+-------+------------+--------------+-----------+---------+---------+----------+--------------+ | Elastic IP | Env | Name | AZ | Created At | Volumes | Private IP | Chef? | InstanceID | Image | relevant? | State | SSH Key | Flavor | Public IP | +------------+----------+----------------------+------------+-----------------+--------------+---------------+-------+------------+--------------+-----------+---------+---------+----------+--------------+ | | _default | hadoop_demo-master-0 | us-east-1d | 20120125-013443 | vol-afdfcac2 | 10.245.74.151 | no | i-2b6d994e | ami-fd589594 | true | running | cluster | t1.micro | 50.17.139.25 | +------------+----------+----------------------+------------+-----------------+--------------+---------------+-------+------------+--------------+-----------+---------+---------+----------+--------------+
Running bootstrap on hadoop_demo-master-0...
Bootstrapping the node redoes its initial setup -- only do this on an aborted launch.
Are you absolutely certain that you want to perform this action? (Type 'Yes' to confirm)
Bootstrapping Chef on ec2-50-17-139-25.compute-1.amazonaws.com
DEBUG: Looking for bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap
DEBUG: Found bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap
DEBUG: Adding ec2-50-17-139-25.compute-1.amazonaws.com
DEBUG: establishing connection to ec2-50-17-139-25.compute-1.amazonaws.com:22
DEBUG: connection established
INFO: negotiating protocol version
DEBUG: remote is SSH-2.0-OpenSSH_5.8p1 Debian-1ubuntu3' DEBUG: local is
SSH-2.0-Ruby/Net::SSH_2.1.4 i686-linux'
DEBUG: read 840 bytes
DEBUG: received packet nr 0 type 20 len 836
INFO: got KEXINIT from server
INFO: sending KEXINIT
DEBUG: queueing packet nr 0 type 20 len 556
DEBUG: sent 560 bytes
INFO: negotiating algorithms
DEBUG: negotiated:
ubuntu' DEBUG: queueing packet nr 4 type 5 len 28 DEBUG: sent 52 bytes DEBUG: read 52 bytes DEBUG: received packet nr 4 type 6 len 28 DEBUG: trying publickey DEBUG: connecting to ssh-agent DEBUG: sending agent request 1 len 42 DEBUG: received agent packet 2 len 5 DEBUG: sending agent request 11 len 0 DEBUG: received agent packet 12 len 5 DEBUG: trying hostbased DEBUG: sending agent request 11 len 0 DEBUG: received agent packet 12 len 5 DEBUG: trying password DEBUG: trying keyboard-interactive DEBUG: trying keyboard-interactive DEBUG: queueing packet nr 5 type 50 len 76 DEBUG: sent 100 bytes DEBUG: read 52 bytes DEBUG: received packet nr 5 type 51 len 28 DEBUG: allowed methods: publickey DEBUG: keyboard-interactive failed ERROR: all authorization methods failed (tried publickey, hostbased, password, keyboard-interactive) Failed to authenticate ubuntu - trying password auth DEBUG: Looking for bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap DEBUG: Found bootstrap template in /usr/lib/ruby/gems/1.8/gems/chef-0.10.8/lib/chef/knife/bootstrap Enter your password: DEBUG: Adding ec2-50-17-139-25.compute-1.amazonaws.com DEBUG: establishing connection to ec2-50-17-139-25.compute-1.amazonaws.com:22 DEBUG: connection established INFO: negotiating protocol version DEBUG: remote is
SSH-2.0-OpenSSH_5.8p1 Debian-1ubuntu3'
DEBUG: local is `SSH-2.0-Ruby/Net::SSH_2.1.4 i686-linux'
DEBUG: read 840 bytes
DEBUG: received packet nr 0 type 20 len 836
INFO: got KEXINIT from server
INFO: sending KEXINIT
DEBUG: queueing packet nr 0 type 20 len 556
DEBUG: sent 560 bytes
INFO: negotiating algorithms
DEBUG: negotiated:Finished! Current state: +------------+----------------------+----------+------------+-----------------+--------------+---------------+--------------+------------+-------+---------+---------+--------------+----------+ | Elastic IP | Name | Env | AZ | Created At | Volumes | Private IP | Image | InstanceID | Chef? | State | SSH Key | Public IP | Flavor | +------------+----------------------+----------+------------+-----------------+--------------+---------------+--------------+------------+-------+---------+---------+--------------+----------+ | | hadoop_demo-master-0 | _default | us-east-1d | 20120125-013443 | vol-afdfcac2 | 10.245.74.151 | ami-fd589594 | i-2b6d994e | no | running | cluster | 50.17.139.25 | t1.micro | +------------+----------------------+----------+------------+-----------------+--------------+---------------+--------------+------------+-------+---------+---------+--------------+----------+
ubuntu@ip-10-212-113-185:/etc/chef$ sudo chmod 777 *
ubuntu@ip-10-212-113-185:/etc/chef$ vi client.rb
ubuntu@ip-10-212-113-185:/etc/chef$ chef-client
Why did you do this? You should be running chef as root, so you should be doing two things:
1) Kill the currently running chef-client (ps -ef | grep chef-client to find the pid) 2) Run sudo chef-client --once
otherwise it looks like you're pretty much in the realm of having a working chef implementation, and you just need to work with getting and tweaking the right cookbooks now.
If you are behind any kind of firewall or NAT you won't be able to contact the chef server from the ec2 instance. You may want to start with the free tier of the opscode hosted chef for your test.
Just installed ruby 1.9.2 and wondering how to downgrade chef to 0.10.6... Should I uninstalled the old gems and restart from installing of chef gem and chef server? It could be very nice if there would be a way for me to elegantly achieve this. Thanks
In another way, I create a new ubuntu server and built it from the very beginning: this time install 1.9.3 and chef-0.10.6. Then install chef server by solo, which does run up normally with generation of validation.pem. Please help share a stable env stack for chef cluster install and dev. Thanks
Installing/uninstalling gems can be done via "gem install" and "gem uninstall" (which you already know, but that leads to) and you can also tell rubygems to stick packages at certain versions:
gem install chef --version "= 0.10.6" --no-ri --no-rdoc
Regarding the server, for now I'm also using hosted chef so that I don't have to deal with server issues. You may want to start with that, and then build your own server after validating that cluster_chef is working for you.
Thanks for these suggestions. I set up a suit of cluster chef 3.0.12 on server with bootstrap through. Thanks.
I tried to launch mysql_demo cluster with below rb file in cluster dir:
when I run "knife cluster mysql_demo sqlsever --bootstrap" , got below message:
WARNING: Bad interval: mysql_demo-sqlsever-0 Nothing to report WARNING: No nodes to bootstrap, exiting
When show the cluster, you can see an sqlsever instance has been created but with "no" presented in chef column... seems no chef has been enabled on the newly created node...
Anything I should do to resolve this or provide more details for diagnosis