saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.16k stars 5.48k forks source link

Azure broken on 2014.7 branch #20385

Closed rallytime closed 9 years ago

rallytime commented 9 years ago

Azure was working very well with 2014.7.0, but when checking out the HEAD of 2014.7, it is no longer working. The cloud-service gets created, but somewhere after that it times out and no VM is created. It just hangs forever at "Attempting function <function wat_for_hostname ...> line and eventually times out.

Here's the relevant information from running in debug mode:

[DEBUG   ] Generating minion keys for 'nt-azure-test'
[DEBUG   ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Sending event - data = {'profile': 'azure-ubuntu-14', 'event': 'starting create', '_stamp': '2015-02-04T18:08:03.260914', 'name': 'nt-azure-test', 'provider': 'azure-config:azure'}
[INFO    ] Creating Cloud VM nt-azure-test
[DEBUG   ] vm_kwargs: {'system_config': <azure.servicemanagement.LinuxConfigurationSet object at 0x7fc7cc7fdc10>, 'deployment_slot': 'production', 'role_size': 'Medium', 'deployment_name': 'nt-azure-test', 'service_name': 'nt-azure-test', 'role_name': 'nt-azure-test', 'network_config': <azure.servicemanagement.ConfigurationSet object at 0x7fc7cc7bccd0>, 'os_virtual_hard_disk': <azure.servicemanagement.OSVirtualHardDisk object at 0x7fc7cc7de8d0>, 'label': 'nt-azure-test'}
[DEBUG   ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Sending event - data = {'_stamp': '2015-02-04T18:08:08.314778', 'service_kwargs': {'service_name': 'nt-azure-test', 'label': 'nt-azure-test', 'location': 'West US', 'description': 'nt-azure-test'}, 'event': 'requesting instance', 'vm_kwargs': {'deployment_slot': 'production', 'role_size': 'Medium', 'deployment_name': 'nt-azure-test', 'service_name': 'nt-azure-test', 'label': 'nt-azure-test', 'role_name': 'nt-azure-test'}}
[DEBUG   ] vm_kwargs: {'system_config': <azure.servicemanagement.LinuxConfigurationSet object at 0x7fc7cc7fdc10>, 'deployment_slot': 'production', 'role_size': 'Medium', 'deployment_name': 'nt-azure-test', 'service_name': 'nt-azure-test', 'role_name': 'nt-azure-test', 'network_config': <azure.servicemanagement.ConfigurationSet object at 0x7fc7cc7bccd0>, 'os_virtual_hard_disk': <azure.servicemanagement.OSVirtualHardDisk object at 0x7fc7cc7de8d0>, 'label': 'nt-azure-test'}
[DEBUG   ] Attempting function <function wait_for_hostname at 0x7fc7cc7cb140>

Here's my versions report:

# salt-cloud --versions-report
            Salt: 2014.7.1-269-g1a0f5e7
          Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
          Jinja2: 2.7.3
        M2Crypto: 0.21.1
  msgpack-python: 0.4.4
    msgpack-pure: Not Installed
        pycrypto: 2.6.1
         libnacl: Not Installed
          PyYAML: 3.11
           ioflo: Not Installed
           PyZMQ: 14.0.1
            RAET: Not Installed
             ZMQ: 4.0.4
            Mako: 0.9.1
 Apache Libcloud: 0.15.1

ping @techhat

rallytime commented 9 years ago

Alright, after doing some git bisecting, I immediately started running into the following stacktrace. I didn't see this stacktrace at the HEAD of 2014.7, but it started showing up very quickly. Here's the stacktrace:

[ERROR   ] There was a profile error: an integer is required
Traceback (most recent call last):
  File "/root/SaltStack/salt/salt/cloud/cli.py", line 231, in run
    self.config.get('names')
  File "/root/SaltStack/salt/salt/cloud/__init__.py", line 1323, in run_profile
    ret[name] = self.create(vm_)
  File "/root/SaltStack/salt/salt/cloud/__init__.py", line 1193, in create
    output = self.clouds[func](vm_)
  File "/root/SaltStack/salt/salt/cloud/clouds/msazure.py", line 694, in create
    deployed = salt.utils.cloud.deploy_script(**deploy_kwargs)
  File "/root/SaltStack/salt/salt/utils/cloud.py", line 957, in deploy_script
    if wait_for_port(host=host, port=port, gateway=gateway):
  File "/root/SaltStack/salt/salt/utils/cloud.py", line 532, in wait_for_port
    sock.connect((test_ssh_host, test_ssh_port))
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
TypeError: an integer is required

I tracked this problem down to a backport that I did here: #19710. There were conflicts in msazure.py. I'm not sure if resolving the stacktrace will fix the original timeout that I was seeing, but it's definitely a good place to start.

rallytime commented 9 years ago

FIXED!

cro commented 9 years ago

Woohoo! Nicely done!!

On Wed, Feb 4, 2015 at 7:27 PM -0800, "Nicole Thomas" notifications@github.com wrote:

FIXED!

— Reply to this email directly or view it on GitHub.