saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14k stars 5.47k forks source link

[salt-cloud][azure] Deployment of vm succeeds, but no response after that #14793

Closed 7oku closed 9 years ago

7oku commented 9 years ago

Hi there,

we can successfully spin up vms on azure (as seen in the web management portal), but salt-cloud does not seem to get any response and waits until it times out after 900 seconds.

Here is the output:

$ salt-cloud -p ubuntu_azure-test salt-deployed-machine -l debug
[DEBUG   ] Missing configuration file: /etc/salt/cloud
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Missing configuration file: /etc/salt/cloud.providers
[DEBUG   ] Including configuration from '/etc/salt/cloud.providers.d/azure.conf'
[DEBUG   ] Reading configuration from /etc/salt/cloud.providers.d/azure.conf
[DEBUG   ] Reading configuration from /etc/salt/cloud.profiles
[DEBUG   ] Configuration file path: /etc/salt/master
[INFO    ] salt-cloud starting
[DEBUG   ] There is no IBM SCE cloud provider configuration available. Not loading module.
[DEBUG   ] 'parallels.avail_sizes' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'parallels.avail_locations' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'proxmox.avail_sizes' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.destroy' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.avail_sizes' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.avail_images' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.avail_locations' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'rackspace.reboot' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] The 'azure' cloud driver is unable to be optimized.
[DEBUG   ] There is no IBM SCE cloud provider configuration available. Not loading module.
[DEBUG   ] 'parallels.avail_sizes' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'parallels.avail_locations' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'proxmox.avail_sizes' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.destroy' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.avail_sizes' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.avail_images' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'saltify.avail_locations' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] 'rackspace.reboot' has been marked as not supported. Removing from the list of supported cloud functions
[DEBUG   ] Failed to execute 'azure.list_nodes()' while querying for running nodes: invalid literal for int() with base 10: ''
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/cloud/__init__.py", line 2166, in run_parallel_map_providers_query
    cloud.clouds[data['fun']]()
  File "/usr/lib/python2.7/dist-packages/salt/cloud/clouds/msazure.py", line 240, in list_nodes
    nodes = list_nodes_full(conn, call)
  File "/usr/lib/python2.7/dist-packages/salt/cloud/clouds/msazure.py", line 272, in list_nodes_full
    if salt.utils.cloud.is_public_ip(ip_address):
  File "/usr/lib/python2.7/dist-packages/salt/utils/cloud.py", line 1646, in is_public_ip
    addr = ip_to_int(ip)
  File "/usr/lib/python2.7/dist-packages/salt/utils/cloud.py", line 1638, in ip_to_int
    ret = ret * 256 + int(octet)
ValueError: invalid literal for int() with base 10: ''
[DEBUG   ] Generating minion keys for 'salt-deployed-machine'
[DEBUG   ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Sending event - data = {'profile': 'ubuntu_azure-test', 'event': 'starting create', '_stamp': '2014-08-06T17:06:01.909813', 'name': 'salt-deployed-machine', 'provider': 'azure-test:azure'}
[INFO    ] Creating Cloud VM salt-deployed-machine
[DEBUG   ] vm_kwargs: {'system_config': <azure.servicemanagement.LinuxConfigurationSet object at 0x7f036b528990>, 'deployment_slot': 'production', 'role_size': 'Small', 'deployment_name': 'salt-deployed-machine', 'service_name': 'salt-deployed-machine', 'role_name': 'salt-deployed-machine', 'network_config': <azure.servicemanagement.ConfigurationSet object at 0x7f036934d990>, 'os_virtual_hard_disk': <azure.servicemanagement.OSVirtualHardDisk object at 0x7f03692d70d0>, 'label': 'salt-deployed-machine'}
[DEBUG   ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Sending event - data = {'_stamp': '2014-08-06T17:06:01.961702', 'service_kwargs': {'service_name': 'salt-deployed-machine', 'label': 'salt-deployed-machine', 'location': 'West Europe', 'description': 'salt-deployed-machine'}, 'event': 'requesting instance', 'vm_kwargs': {'deployment_slot': 'production', 'role_size': 'Small', 'deployment_name': 'salt-deployed-machine', 'service_name': 'salt-deployed-machine', 'label': 'salt-deployed-machine', 'role_name': 'salt-deployed-machine'}}
[DEBUG   ] vm_kwargs: {'system_config': <azure.servicemanagement.LinuxConfigurationSet object at 0x7f036b528990>, 'deployment_slot': 'production', 'role_size': 'Small', 'deployment_name': 'salt-deployed-machine', 'service_name': 'salt-deployed-machine', 'role_name': 'salt-deployed-machine', 'network_config': <azure.servicemanagement.ConfigurationSet object at 0x7f036934d990>, 'os_virtual_hard_disk': <azure.servicemanagement.OSVirtualHardDisk object at 0x7f03692d70d0>, 'label': 'salt-deployed-machine'}
[DEBUG   ] Attempting function <function wait_for_hostname at 0x7f03692d6050>
[DEBUG   ] Caught exception in wait_for_fun: local variable 'data' referenced before assignment
[DEBUG   ] Retrying function <function wait_for_hostname at 0x7f03692d6050> on  (try 1)
[DEBUG   ] Caught exception in wait_for_fun: local variable 'data' referenced before assignment
[DEBUG   ] Retrying function <function wait_for_hostname at 0x7f03692d6050> on  (try 2)
[DEBUG   ] Caught exception in wait_for_fun: local variable 'data' referenced before assignment
[DEBUG   ] Retrying function <function wait_for_hostname at 0x7f03692d6050> on  (try 3)
[...]

The machine is up and running:

$ azure vm list | grep salt-deployed-machine
data:    salt-deployed-machine  ReadyRole           West Europe  salt-deployed-machine.cloudapp.net  <ip>  

We tried with stable and current develop branch:

# salt --version
salt 2014.7.0-n/a-91bebba (Helium)
# salt-cloud --version
salt-cloud 2014.7.0-n/a-91bebba (Helium)

Also, please notice the trace "Failed to execute 'azure.list_nodes()' while querying for running nodes: invalid literal for int() with base 10: " - not sure if this has to do with it or is just cosmetic.

BTW: What exactly should happen now? Is it something like waiting from azure to get the hostname so salt can go and install minion through ssh or is it waiting for the minion to respond with the hostname?

Thanks! 7oku

basepi commented 9 years ago

Thanks for the report, we'll investigate these issues.

@techhat ping

7oku commented 9 years ago

@basepi @techhat Well, seems i could fix it.

The first trace is important, because this shows that the function responsible for getting a list of all nodes is failing, if there is a node without an ip address (this is the case, if you have suspended machines in your azure account!). It tries to calculate with "nothing" -> FAIL! Therefore, this function will never return a valid "URL" field, which is checked by waiting_for_hostname().

Applying this diff to /usr/lib/python2.7/dist-packages/salt/cloud/clouds/msazure.py, to calculate only if ip address is set, fixed it for me:

--- msazure.py_old  2014-08-07 14:20:21.222540724 +0200
+++ msazure.py_new  2014-08-07 14:21:29.174543308 +0200
@@ -268,11 +268,12 @@
             ret[deployment]['public_ips'] = []
             role_instances = deploy_dict['role_instance_list']
             for role_instance in role_instances:
-                ip_address = role_instances[role_instance]['ip_address']
-                if salt.utils.cloud.is_public_ip(ip_address):
-                    ret[deployment]['public_ips'].append(ip_address)
-                else:
-                    ret[deployment]['private_ips'].append(ip_address)
+                ip_address = role_instances[role_instance]['ip_address']
+                if ip_address:
+                    if salt.utils.cloud.is_public_ip(ip_address):
+                        ret[deployment]['public_ips'].append(ip_address)
+                    else:
+                        ret[deployment]['private_ips'].append(ip_address)
                 ret[deployment]['size'] = role_instances[role_instance]['instance_size']
             roles = deploy_dict['role_list']
             for role in roles:
basepi commented 9 years ago

Awesome! Would you mind submitting a pull request?

7oku commented 9 years ago

I'll send you a pull request tomorrow, when I added a more secure way to not only check for just a present value in the variable, but also for a valid ip address.

basepi commented 9 years ago

Great!