snobear / ezmomi

cli tool for common VMware vSphere tasks
MIT License
119 stars 49 forks source link

cloning "Uncustomizable Guest" gives a nasty exception and leaves NICs disconnected #33

Open bsherman opened 9 years ago

bsherman commented 9 years ago

Sometimes we need to clone templates that don't support guest customization. This results in:

Cloning debian to new host foobar with 1024MB RAM...
Traceback (most recent call last):
  File "./bin/ezmomi", line 4, in <module>
    cli.cli()
  File "/home/bsherman/code/git/ezmomi/ezmomi/cli.py", line 31, in cli
    ez.clone()
  File "/home/bsherman/code/git/ezmomi/ezmomi/ezmomi.py", line 298, in clone
    result = self.WaitForTasks(tasks)
  File "/home/bsherman/code/git/ezmomi/ezmomi/ezmomi.py", line 430, in WaitForTasks
    raise task.info.error
pyVmomi.VmomiSupport.UncustomizableGuest: (vim.fault.UncustomizableGuest) {
   dynamicType = <unset>,
   dynamicProperty = (vmodl.DynamicProperty) [],
   msg = "Customization of the guest operating system 'other26xLinux64Guest' is not supported in this configuration. Microsoft Vista (TM) and Linux guests with Logical Volume Manager are supported only for recent ESX host and VMware Tools versions. Refer to vCenter documentation for supported configurations.",
   faultCause = <unset>,
   faultMessage = (vmodl.LocalizableMessage) [],
   uncustomizableGuestOS = 'other26xLinux64Guest'
}

This alone is ugly, but even worse is the newly created NIC devices are left disconnected and the VM is left powered off.

A common workaround for unsupported guests is to use vmrun capabilities to run a customization script on the machine, but that fails to work when NICs are not connected. Of course the script won't run while VM is powered off, too.

I propose to check if the VirtualMachineGuestOsIdentifier supports customization and only try supported operations, if so. I'm assuming we'd still nuke existing NIC devices and add the number of devices requested by --ips. Memory, CPU would of course configure the biggest problem I'm seeing is the disconnected NICs, but I should point out, VMware is certainly failing to configure networking on them, too.

An alternative is to catch the exception and then make sure we have the NICs connected on the target VM, and then power it on, but that still leaves a failed clone operation in vcenter which is pretty ugly.

bsherman commented 9 years ago

@snobear, I'd like some input on this since I don't want to take things a direction you don't agree with.

I've implemented locally, a simple exception handler for when customization fails. It then reconfigures the VM to ensure the NICs are enabled and powers on the VM.

The more I think about it, I don't think this should be default behavior. Reason, if you aren't PLANNING to have customization fail for unsupported OSes, then you could very well have VMs booting up, using IPs which you didn't plan for.

With this line of reasoning, I'd probably enable this behavior behind a '--unsupportedGuest' CLI param. Or maybe "--customGuest". Naming is hard here, since we are talking about "Unsupported Guest Customization". All big words. :)

What I'd really like to do is take this a step further, and if unsupportedGuest support is enabled, implement vmrun type functionality to run a custom guest customization script on the VM. It would need some extra configuration:

Hmm... really, maybe this would be "customClone" instead of the default "clone" option, and could use a script for network configuration instead of even attempting the standard customization stuff.

Obviously, the driving factor here is I want to run some non-customizable OSes, eg, Debian and FreeBSD. I figure I could provide at least a sample customization script that does simple network config that more-or-less matches what ezmomi already does by default. But all this clearly will require extra config since the user will need to at least place said config script onto their template.

There's some thoughts, I'd love some feedback. :-)

snobear commented 9 years ago

My instinct would be to go with two new optional params, one for a fallback script and another for an optional script to run for guests that are customizable. I like your more specific naming thoughts on that though. Hmm let me sleep on it. Awesome comments and suggestions all around. Thanks for prodding me on it, we'll get it movin!

snobear commented 9 years ago

As a side note, let's name parameters with dash-delimited words as opposed to camelCase. The camel case works well on the subcommands like powerOn, so that's all good. Dash-delimited seems standard on many tools I use. The java world probably does the opposite though :).

Alright, so the proposal is that if the "uncustomizable guest" exception is caught, ezmomi would fire off a script as specified by --unsupported-guest-script, for example. That script must already reside on the VM template; I'm assuming its not possible to copy a file to a guest VM.

Alternatively, what if we just had a generic --run-script parameter that fires a script. It would be up to the user to supply it if needed. Does that make sense with your particular workflow? This approach makes the assumption that a user knows he or she will be cloning an unsupported guest OS. Or they want to run a script on a supported guest, or just run it regardless. The question is how to handle the uncustomizable guest exception, since you'd want to let the user know that this guest can't/could not be customized and that they could use the --run-script param.

Whats your thoughts on the alternative approach? We can then think about the other parameters needed like values and credentials.

snobear commented 9 years ago

If you don't supply --run-script and the UG exception is caught ezmomi could either:

  1. fail hard and exit, printing a message about needing to supply --run-script to be able to clone
  2. continue cloning orocess and just print a warning about the OS being unsupported and thus networking can't be configured and to use --run-script.
  3. prompt the user if they want to continue cloning even though guest is unsupported. similar warning as 2. using the --silent option would bypass the prompt and continue (and probably still print a notice/warning for logging purposes)

I like 2 or 3. I use to just clone VMs and handle the networking setup manually, so there may be other users that do the same so thus wouldn't care to run a script.

bsherman commented 9 years ago

Lots of good ideas here. I'm inclined to have a single --run-script for simplicity's sake, plus use your suggestion 2 for when --run-script is not provided.

I think this gives the user clean messaging about about the unsupported guest with suggested next steps, but it does not change expected behavior.

Let me summarize the workflow I'll plan to implement if we agree:

  1. Standard clone behavior using only existing CLI arguments
    1. clone would occur as expected today
    2. instead of failing with the nasty UncustomizableGuest exception, we will:
      1. warn the user and suggest use of --run-script to customize unsupported OSes
      2. explain that VM will be powered on but networking will be disconnected
        1. powering on is what a user would expect if customization had succeeded
        2. while a user may expect a connected network, leaving it disconnected is a safety measure preventing clones from all having same IP, which a user would NOT expect
      3. continue with normal subsequent behavior
  2. New clone behavior with --run-script
    1. when UncustomizableGuest exception occurs, we will:
      1. inform the user that they are cloning unsupported OS and we assume the script will customize the guest
      2. print connecting network
      3. print powerOn messages
    2. when clone completes, regardless of supported/unsupported OS
      1. run the provided script with all defined arguments and credentials

I feel pretty good about this workflow.

What's left to be defined is how we configure params, credentials, etc.

I'm inclined to have a section in the yaml config file which defines an array of scripts, their names, paths and the associated params for each. Then, either --run-script could take the script name as an argument, or, maybe the script names simply match the template names, so it's an automatic mapping.

At the moment, I'm assuming we stage scripts on the ezmomi host, which would require copying the file from local host to guest, but I think we can pull that off. Worst case scenario, we have to pre-stage the script on the guest, which means we'd still maybe name the scripts or define a template name as having a script with certain params.

snobear commented 9 years ago

Yes, I agree on that summary, way to distill it all down into a nice list. I feel good about that workflow as well.

I like the idea of a section in the yaml file with an array of scripts, their params, and settings. It sounds like how the travis.yml file works for Travis, so I like the familiarity of that approach. Let me think about what we will actually be able to pass to --run-script. I'm thinking just pass it a section/array name only and everything must be defined in the yaml file. In that case, maybe it makes more sense to call it --run-tasks or something...

snobear commented 9 years ago

Here's a first pass at the yaml config for the run script. The ezmomi call would look like --run-script /opt/vmware-scripts/debian_customizer.sh.

scripts:
  /opt/vmware-scripts/debian_customizer.sh:     # path to script on guest
    source: /home/me/scripts/debian_customizer.sh
    params:
      mykey: myvalue
      foo: 22

  /opt/stuff/bsd_configure.py:    # example of a script that already resides on the VM template
    params:
      foo: 28

source is only needed if the script doesn't reside on the VM template already and needs to be copied. It must reside on the ezmomi host. I'm not familiar with copying files to a guest. I see there are a few options...whats a good method?

I'm thinking you'll still define static IPs with the --ips param instead of defining it here in the config file though, right? Then just pass that along to the guest script, looking up any additional networking configs in config.yml and passing them along as needed.

One question is how should params be specified to the guest script? I'm assuming the script language could be anything, e.g. bash, python, ruby, perl, etc... so it'd have to be a standard, generic, and flexible. I'm thinking positional arguments, and you can adjust the position they are passed in by changing the order of params as they show up for the script under scripts in config.yml. Then again, how do you pass a dynamic list of IP addresses where it could be any number of IPs. Sounds like we would actually need keyword arguments and not positional. hmm. I just want to avoid having users adhere to a certain script style. I know I've got a handful of bash scripts that just use positional args..not even sure if I've ever written a bash script with keyword params.I usually just reach for python or ruby if I need that prettiness :).

Let me know your thoughts on this.

bsherman commented 9 years ago

YAML looks pretty good to me. The one missing thing is guest credentials. For vmrun functionality to execute/copy things on the guest, credentials need to be provided that are valid on the guest VM itself.

Also, I agree positional arguments is the right thing. It's not awesome, but the simplest path would be force the user to know whether they are handling multiple IPs or not. We'll always pass the IP arguments first, so if we only are using one IP in --ips then we pass one argument, for two IPs we pass two arguments, etc. Those would be automatic, and the script would have to know what to do with it, any other arguments are optional or custom to the script. And, since they are positional, maybe just index them in YAML to keep it clean?

maybe something like:

scripts:
  /opt/vmware-scripts/debian_customizer.sh:     # path to script on guest
    source: /home/me/scripts/debian_customizer.sh
    guest:
      username: root
      password: pickabetterpassword
    optional_args:
      0: myvalue
      1: 22

  /opt/stuff/bsd_configure.py:    # example of a script that already resides on the VM template
    guest:
      username: root
      password: pickabetterpassword
    optional_args:

I'm also open to keyword params. If we want to go that way, I'll make sure we have a decent example, because those aren't as common in the bash world.

snobear commented 9 years ago

That looks good. I think we're pretty close, so go ahead and start implementing and we'll get our hands on it to see how it plays out.

For optional_args, we can probably drop the explicit index number and make it a yaml array with the dashes. Maybe it renders as an array without those dashes though, I'm not sure. I figure those optional args will always be passed to the script FIRST, then the flexible number of IP arguments can be tacked on to the end of the parameters.

    optional_args:
      - myvalue
      - 22

so you'd have a script being called like:

/opt/vmware-scripts/debian_customizer.sh myvalue 22 192.168.40.10 192.168.80.12

I think it'd be easy enough to make a bash script flexible enough to "stuff all parameters after the second one into its own array".