infochimps-labs / ironfan

Chef orchestration layer -- your system diagram come to life. Provision EC2, OpenStack or Vagrant without changes to cookbooks or configuration
http://infochimps.com
Other
501 stars 102 forks source link

Chef returns error when showing clusters if a machine is shutting-down #186

Open fractaloop opened 11 years ago

fractaloop commented 11 years ago

If an EC2 instance is in the process of shutting down when running knife cluster show then the Opscode server will return an error for the given query. It resolves as soon as the server transitions from shutting-down to terminated

temujin9 commented 11 years ago

Can you give a dump of that error with -VV?

fractaloop commented 11 years ago
DEBUG: Using configuration from /Users/fractaloop/infochimps/infochimps-homebase/.chef/knife.rb
Inventorying servers in schloss cluster, all facets, all servers
INFO: Loading cluster /Users/fractaloop/infochimps/infochimps-homebase/clusters/schloss.rb
  schloss:          Loading chef
  schloss:          Loading ec2
  schloss:            - loading clients
DEBUG: Signing the request as fractaloop
DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/client
  schloss:            - loading nodes  schloss:           - loading machines

DEBUG: Signing the request as fractaloop
DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/node
  schloss:            - loading ebs_volumes  schloss:             - loading roles

DEBUG: Signing the request as fractaloop
DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/role
INFO: HTTP Request Returned 500 Internal Server Error: internal service error
ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 1/5 in 4s
  schloss:            - loading keypairs
  schloss:            - loading security_groups
<SNIPPED OUT DEBUG KEYPAIRS>
  schloss:            - loaded keypairs
<SNIPPED OUT DEBUG EBS VOLUMES>
  schloss:            - loaded ebs_volumes
<SNIPPED OUT DEBUG SECURITY GROUPS>
  schloss:            - loaded security_groups
DEBUG: Loaded <Node            schloss-store-0         role[nfs_client], role[systemwide], role[chef_client], role[ssh], role[set_hostname], role[volumes], role[org_base], role[org_users], role[zookeeper_client], role[kafka], role[package_set], role[minidash], role[schloss_cluster], role[schloss_store]>
DEBUG: Loaded <Node            schloss-master-0        role[nfs_client], role[systemwide], role[chef_client], role[ssh], role[set_hostname], role[volumes], role[org_base], role[org_users], role[zookeeper_server], role[zookeeper_client], role[storm_master], role[storm_ui], role[package_set], role[minidash], role[schloss_cluster], role[schloss_master]>
DEBUG: Signing the request as fractaloop
DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/node
  schloss:            - loaded roles
  schloss:            - loaded nodes
INFO: HTTP Request Returned 500 Internal Server Error: internal service error
ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 2/5 in 5s
INFO: HTTP Request Returned 500 Internal Server Error: internal service error
ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 3/5 in 14s
INFO: HTTP Request Returned 500 Internal Server Error: internal service error
ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 4/5 in 27s
DEBUG: Loaded <Client          schloss-store-0         /Users/fractaloop/infochimps/infochimps-homebase/knife/credentials/client_keys/client-schloss-store-0.pem>
DEBUG: Loaded <Client          schloss-config-0        /Users/fractaloop/infochimps/infochimps-homebase/knife/credentials/client_keys/client-schloss-config-0.pem>
DEBUG: Loaded <Client          schloss-master-0        /Users/fractaloop/infochimps/infochimps-homebase/knife/credentials/client_keys/client-schloss-master-0.pem>
  schloss:            - loaded clients
  schloss:          Reconciling DSL and provider information
<SNIPPED MEGAHASH>
  +------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+
  | Name             | Chef? | State       | Flavor   | AZ         | Env | MachineID  | Public IP     | Private IP   | Created On | Image        | Volumes                                  | SSH Key | Startable | Launchable |
  +------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+
  | schloss-master-0 | yes   | running     | m1.large | us-east-1d | dev | i-XXXXXXXX | XX.XX.XX.XX | XX.XX.XX.XX | 2012-10-03 | ami-4d18d624 | vol-XXXXXXXX, vol-XXXXXXXX, vol-XXXXXXXX | schloss | no        | no         |
  | schloss-store-0  | yes   | running     | m1.large | us-east-1d | dev | i-XXXXXXXX | XX.XX.XX.XX  | XX.XX.XX.XX | 2012-10-03 | ami-4d18d624 | vol-XXXXXXXX                             | schloss | no        | no         |
  | schloss-worker-0 | no    | not running | m1.large | us-east-1d | dev |            |               |              |            |              |                                          |         | no        | yes        |
  +------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+
mrflip commented 11 years ago

Those are errors I usually see when the chef server is slow; in the dump you showed, chef recovered after the progressive backoff. Are you sure this is actually correlated with the shutdown?

flip

On Wed, Oct 3, 2012 at 11:51 AM, Logan Lowell notifications@github.comwrote:

DEBUG: Using configuration from /Users/fractaloop/infochimps/infochimps-homebase/.chef/knife.rb Inventorying servers in schloss cluster, all facets, all servers INFO: Loading cluster /Users/fractaloop/infochimps/infochimps-homebase/clusters/schloss.rb schloss: Loading chef schloss: Loading ec2 schloss: - loading clients DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/client schloss: - loading nodes schloss: - loading machines

DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/node schloss: - loading ebs_volumes schloss: - loading roles

DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/role INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 1/5 in 4s schloss: - loading keypairs schloss: - loading security_groups

schloss: - loaded keypairs

schloss: - loaded ebs_volumes

schloss: - loaded security_groups DEBUG: Loaded DEBUG: Loaded DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/node schloss: - loaded roles schloss: - loaded nodes INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 2/5 in 5s INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 3/5 in 14s INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000, retrying 4/5 in 27s DEBUG: Loaded DEBUG: Loaded DEBUG: Loaded schloss: - loaded clients schloss: Reconciling DSL and provider information

+------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+ | Name | Chef? | State | Flavor | AZ | Env | MachineID | Public IP | Private IP | Created On | Image | Volumes | SSH Key | Startable | Launchable |

+------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+ | schloss-master-0 | yes | running | m1.large | us-east-1d | dev | i-XXXXXXXX | XX.XX.XX.XX | XX.XX.XX.XX | 2012-10-03 | ami-4d18d624 | vol-XXXXXXXX, vol-XXXXXXXX, vol-XXXXXXXX | schloss | no | no | | schloss-store-0 | yes | running | m1.large | us-east-1d | dev | i-XXXXXXXX | XX.XX.XX.XX | XX.XX.XX.XX | 2012-10-03 | ami-4d18d624 | vol-XXXXXXXX | schloss | no | no | | schloss-worker-0 | no | not running | m1.large | us-east-1d | dev | | | | | | | | no | yes |

+------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+

— Reply to this email directly or view it on GitHubhttps://github.com/infochimps-labs/ironfan/issues/186#issuecomment-9117658.

infochimps.com - discover data

fractaloop commented 11 years ago

I can repro it with every shutdown I've tried. It always resolves when the server is no longer shutting-down. It never occurs during any other knife command.

It will NOT happen if I terminate the instance via the AWS console and it's "shutting-down", only from a knife cluster kill

It WILL happen if I have manually terminate the instance via the AWS console, wait for it to finish shutting down, and then knife cluster kill the server that has a Chef node and client.

Basically, it will always happen after knife deletes the node/client from Chef.

-Logan

On Thu, Oct 4, 2012 at 1:51 PM, Philip (flip) Kromer < notifications@github.com> wrote:

Those are errors I usually see when the chef server is slow; in the dump you showed, chef recovered after the progressive backoff. Are you sure this is actually correlated with the shutdown?

flip

On Wed, Oct 3, 2012 at 11:51 AM, Logan Lowell notifications@github.comwrote:

DEBUG: Using configuration from /Users/fractaloop/infochimps/infochimps-homebase/.chef/knife.rb Inventorying servers in schloss cluster, all facets, all servers INFO: Loading cluster /Users/fractaloop/infochimps/infochimps-homebase/clusters/schloss.rb schloss: Loading chef schloss: Loading ec2 schloss: - loading clients DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/client schloss: - loading nodes schloss: - loading machines

DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/node schloss: - loading ebs_volumes schloss: - loading roles

DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/role INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for

https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000,

retrying 1/5 in 4s schloss: - loading keypairs schloss: - loading security_groups

schloss: - loaded keypairs

schloss: - loaded ebs_volumes

schloss: - loaded security_groups DEBUG: Loaded DEBUG: Loaded DEBUG: Signing the request as fractaloop DEBUG: Sending HTTP Request via GET to api.opscode.com:443/organizations/infochimps_v2/search/node schloss: - loaded roles schloss: - loaded nodes INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for

https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000,

retrying 2/5 in 5s INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for

https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000,

retrying 3/5 in 14s INFO: HTTP Request Returned 500 Internal Server Error: internal service error ERROR: Server returned error for

https://api.opscode.com/organizations/infochimps_v2/search/client?q=name:schloss-*%20OR%20clientname:schloss-*&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000,

retrying 4/5 in 27s DEBUG: Loaded DEBUG: Loaded DEBUG: Loaded schloss: - loaded clients schloss: Reconciling DSL and provider information

+------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+

| Name | Chef? | State | Flavor | AZ | Env | MachineID | Public IP | Private IP | Created On | Image | Volumes | SSH Key | Startable | Launchable |

+------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+

schloss-master-0 yes running m1.large us-east-1d dev i-XXXXXXXX XX.XX.XX.XX XX.XX.XX.XX 2012-10-03 ami-4d18d624 vol-XXXXXXXX, vol-XXXXXXXX, vol-XXXXXXXX schloss no no schloss-store-0 yes running m1.large us-east-1d dev i-XXXXXXXX XX.XX.XX.XX XX.XX.XX.XX 2012-10-03 ami-4d18d624 vol-XXXXXXXX schloss no no schloss-worker-0 no not running m1.large us-east-1d dev
no yes

+------------------+-------+-------------+----------+------------+-----+------------+---------------+--------------+------------+--------------+------------------------------------------+---------+-----------+------------+

— Reply to this email directly or view it on GitHub< https://github.com/infochimps-labs/ironfan/issues/186#issuecomment-9117658>.

infochimps.com - discover data

— Reply to this email directly or view it on GitHubhttps://github.com/infochimps-labs/ironfan/issues/186#issuecomment-9152611.