vmware / pyvmomi

VMware vSphere API Python Bindings
Apache License 2.0
2.21k stars 766 forks source link

Failure with unintended side-effects, while reconfiguring a Distributed vSwitch #507

Open tim-ireland opened 7 years ago

tim-ireland commented 7 years ago

Environment:

vCenter 6.5 pyVmomi 6.5 vconnector (0.4.6) Python 2.7.13

I have a script which connects to vCenter and updates the discovery protocol settings on all dvSwitches.

Here is a snippet that illustrates what the code is doing. Basically it finds dvSwitches that need updates and then it creates a new LinkDiscoveryProtocolConfig and sets the protocol to ProtocolType.lldp, and then it updates the dvswitch with the new configuration.

 def enable_lldp_advertise(self, dvswitch):
        """Enable LLDP advertising settings on a Distributed vSwitch.

        Arguments:
            dvswitch (DistributedVirtualSwitch): A distributed vSwitch.

        Returns:
             bool: True if the update is successful. False otherwise.
        """
        protocol_config = vim.host.LinkDiscoveryProtocolConfig()
        protocol_config.protocol = vim.host.LinkDiscoveryProtocolConfig.ProtocolType.lldp
        return self._enable_link_discovery_advertise(dvswitch, protocol_config)

def _enable_link_discovery_advertise(self, dvswitch, protocol_config):
    """Enable Link Discovery Protocol advertising settings on a Distributed vSwitch.

    Arguments:
        dvswitch (DistributedVirtualSwitch): A distributed vSwitch.
        protocol_config (LinkDiscoveryProtocolConfig): Configuration specifying the selected
            Link Discovery Protocol to use for this switch.

    Returns:
        bool: True if the update is successful. False otherwise.
    """
    if self.is_advertise_enabled(dvswitch):
        # advertise is enabled, don't change the setting
        protocol_config.operation = dvswitch.config.linkDiscoveryProtocolConfig.operation
    elif self.is_listen_enabled(dvswitch):
        # listen is enabled, add advertise by setting to both
        protocol_config.operation = vim.host.LinkDiscoveryProtocolConfig.OperationType.both
    else:
        # nothing is enabled, set to advertise
        protocol_config.operation = vim.host.LinkDiscoveryProtocolConfig.OperationType.advertise

    config_spec = vim.dvs.VmwareDistributedVirtualSwitch.ConfigSpec(
        configVersion=dvswitch.config.configVersion,
        linkDiscoveryProtocolConfig=protocol_config)
    task = dvswitch.ReconfigureDvs_Task(spec=config_spec)

    return vsphere_utils.wait_for_task(task=task,
                                       timeout=30,
                                       logger=self._logger)

The vsphere_utils.wait_for_task is implemented as:

from pyVmomi import vim

def wait_for_task(task, timeout, logger):
    """Wait for a Task in vSphere to run to completion.

    Arguments:
        task (Task): The vSphere Task.
        timeout (int): Maximum amount of time in seconds to wait for the task to complete
        logger (logger): For logging updates

    Returns:
         bool: True is the Task was successful. False otherwise.
    """
    running = vim.TaskInfo.State.running
    queued = vim.TaskInfo.State.queued

    start = time.time()
    while task.info.state == running or task.info.state == queued:
        if time.time() - start > timeout:
            logger.warning('%s second timeout exceeded while waiting for Task to complete', timeout)
            break
        logger.info('Task running...')
        time.sleep(1)

    if task.info.state == vim.TaskInfo.State.error:
        logger.critical(task.info.error.msg)

    return task.info.state == vim.TaskInfo.State.success

I am investigating an issue where this code resulted in the dis-association of a set of Physical NICs from the active interfaces associated with the Dv_uplinks on a dvswitch. I don't see how that could occur based on the data model, but I am curious about the failure mode for ReconfigureDvs_Task method. During the execution, the code received the following exception:

ERROR:vsphere.EnableDiscoveryProtocolOnDistributedVSwitch:Traceback (most recent call last):
  File "/opt/vsphere/actions/../lib/vsphere_connector.py", line 60, in connect
    self.connect_raise_exceptions(username, password, host, verify_ssl)
  File "/opt/vsphere/actions/../lib/vsphere_connector.py", line 98, in connect_raise_exceptions
    self.vsphere_connection.connect()
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/vconnector/core.py", line 159, in connect
    sslContext=self.ssl_context,
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/pyVim/connect.py", line 663, in SmartConnect
    sslContext)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/pyVim/connect.py", line 552, in __FindSupportedVersion
    sslContext)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/pyVim/connect.py", line 472, in __GetServiceVersionDescription
    tree = __GetElementTreeFromUrl(url, sslContext)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/pyVim/connect.py", line 438, in __GetElementTreeFromUrl
    sock = requests.get(url, verify=False)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/requests/api.py", line 70, in get
    return request(\'get\', url, params=params, **kwargs)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/opt/virtualenvs/vsphere/lib/python2.7/site-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host=\’vcsa65.plexxi.com\’, port=443): Max retries exceeded with url: //sdk/vimServiceVersions.xml (Caused by NewConnectionError(\'<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f1b041a2ad0>: Failed to establish a new connection: [Errno -2] Name or service not known\',))
ERROR:root:Cannot connect to vcsa65.plexxi.com: HTTPSConnectionPool(host=\'vcsa65.plexxi.com\', port=443): Max retries exceeded with url: //sdk/vimServiceVersions.xml (Caused by NewConnectionError(\'<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f1b040b08d0>: Failed to establish a new connection: [Errno -2] Name or service not known\',))

My theory is that the DNS configuration changed on the host while thisReconfigureDvs_Task was in progress, and that resulted in the HTTPSConnectionPool to fail to make additional connections. My primary question is what will happen to vSphere if this happens? Is there a transactional rollback mechanism that will undo any of the changes that ReconfigureDvs_Task initiated, or will this potentially leave the system in a mis-configured state?

I found a similar issue HERE with requests/urllib3 HTTPSConnectionPool related to file descriptors, yet the error: [Errno -2] Name or service not known in this case suggests DNS resolution errors.

tim-ireland commented 7 years ago

Hi @tianhao64, would you be able to tell me if there is a transaction/rollback mechanism in place that would mitigate a loss of connection in the middle of a pyvmomi update to configuration? In this case, modifying a vim.dvs.VmwareDistributedVirtualSwitch.ConfigSpec. Or is it actually possible to corrupt a configuration if a connection is lost mid-update?

Thanks!

tianhao64 commented 7 years ago

@tim-ireland There is no tranaction/rollback mechanism AFAIK. We released the task.py in pyVim package. Do you want to give that file a try?

teror4uks commented 7 years ago

maybe I wrong but, problem must be in urllib3 when you try to connect not safe cert in your esxi host:

try to open url in your browser https://vcsa65.plexxi.com//sdk/vimServiceVersions.xml