nutanix / nutanix.ansible

Official Nutanix Ansible collections
GNU General Public License v3.0
67 stars 36 forks source link

[Bug] Creation of VMs in module nutanix.ncp.ntnx_vms is not indempotent #284

Closed axielldan closed 2 years ago

axielldan commented 2 years ago

Describe the bug If I write a simple playbook to create a vm and run it twice it creates two VMs, ansible is supposed to run indempotently, if I run the same module twice with the same parameters there should be no changes.

If I pre-generate the vm_uuid and add it to the module's parameters it fails to create the VM altogether.

I'm assuming then that you're only using vm_uuid as the indempotent variable.

To Reproduce Steps to reproduce the behavior:

  1. ansible-playbook -vvv -i inventory/nutanix.yml create_ntx_vms.yml
  2. sample playbook used `---
    • name: Create Nutanix VM nutanix.ncp.ntnx_vms: nutanix_host: "{{ vault_nutanix_host }}" nutanix_username: "{{ vault_nutanix_username }}" nutanix_password: "{{ vault_nutanix_password }}" validate_certs: False state: "{{ vm.state | default('present') }}" name: "{{ vm.name }}" vm_uuid: "{{ vm.uuid | default(omit) }}" timezone: "{{ vm.timezone | default('UTC') }}" cluster: name: "{{ vm.cluster | default('DEV_CLUSTER') }}" vcpus: "{{ vm.vcpus | default('2') }}" cores_per_vcpu: "{{ vm.cores_per_vcpu | default('1') }}" memory_gb: "{{ vm.memory_gb | default('2') }}" disks: "{{ vm.disks | default(default_os_disk) }}" guest_customization: type: "cloud_init" script_path: "./files/cloud_init.yml" is_overridable: True networks: "{{ vm.networks | default(default_network) }}" loop: "{{ vms }}" loop_control: loop_var: vm async: 3600 poll: 0 register: ntx_vms_results ` Stack trace msg: 'Failed fetching URL: https://10.0.0.10:9440/api/nutanix/v3/vms/720461aa-0ad6-4f4a-811c-929a475f331f'

Expected behavior For example I pass in a list a of two VMs. On the 1st run I would expect it to create the vms with "ok=2 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0" On the 2nd run I would expect it to create nothing with "ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0"

Additional context I can create a pre-check to see if a vm already exists by the same name before creating it, using the nutanix.ncp.ntnx_vms_info moduel but this is not how indempotency is supposed to work in Ansible.

I can only assume that the indempotent part of the ntnx_vm module is the vm_uuid and that it is generated on vm creation, so it can later be used when changing the state of a VM, but that kind sucks. It means I have to either set my codebase to retain state (build a vm, query vm uuid, merge vm uuid back into my variables, commit to main) or write my playbooks to check state before performing a task.

premkarat commented 2 years ago

Nutanix platform allows VMs to have duplicate names. Now the the only way to ensure idempotency is through vm_uuid hence i guess we have no other choice but to use vm_uuid for idempotent operations.

axielldan commented 2 years ago

But vm_uuid is not something the user can specify?

premkarat commented 2 years ago

Please help me understand. May i know why the user cannot specify the vm_uuid?

axielldan commented 2 years ago

If I attempt to create a VM supplying a pre-generated vm_uuid by passing in the following vm definition:

  - name: demo.example
    vcpus: 2
    cores_per_vcpu: 2
    memory_gb: 8
    uuid: 6310340c-4465-4561-94c8-518463d48f8d
    networks:
      - is_connected: True
        subnet:
          name: VLAN120
        private_ip: "10.20.120.99"

I get this error:

TASK [Create Nutanix VM] ********************************************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => changed=false
  error: 'HTTP Error 404: NOT FOUND'
  msg: 'Failed fetching URL: https://10.0.0.10:9440/api/nutanix/v3/vms/6310340c-4465-4561-94c8-518463d48f8d'
  response: null
  status_code: 404
axielldan commented 2 years ago

So to my mind, the module should work that if a call to the api endpoint which specifies the vm_uuid retrns a 404 not found, that means the VM doesn't exist yet and the nutanix.ncp.ntnx_vms module should therefore build it.

Of course I understand that this may not work as potentially the uuid is auto-generated by the hypervisor management engine and so it cannot be a user specified option.

You could then get around the issue by having a "vm name as unique identifier" boolean on the module where default is false which would then search the cluster for a VM with a matching name and exit as unchanged if it finds one.

premkarat commented 2 years ago

vm_uuid is used for day2 operation. Here is the way to use this.

  1. Create a vm using ansible playbook
  2. Gather the vm_uuid from response of 1)
  3. Update the playbook used in 1) with the vm_uuid from 2)
  4. Manage day2 operations on the VM with the same updated playbook

May i understand the difficulty with this workflow and why you don't prefer it?

premkarat commented 2 years ago

So to my mind, the module should work that if a call to the api endpoint which specifies the vm_uuid retrns a 404 not found, that means the VM doesn't exist yet and the nutanix.ncp.ntnx_vms module should therefore build it.

Of course I understand that this may not work as potentially the uuid is auto-generated by the hypervisor management engine and so it cannot be a user specified option.

You could then get around the issue by having a "vm name as unique identifier" boolean on the module where default is false which would then search the cluster for a VM with a matching name and exit as unchanged if it finds one.

There is a side effect. What if there are 2 VMs with same name. How will you determine the correct one?

axielldan commented 2 years ago

Ansible modules are supposed to be "Indempotent" the definition of Indempotency is "An operation that produces the same results no matter how many times it is performed.".

The problem with the nutanix.ncp.ntnx_vms module in it's current form is that if I define a VM in ansible vars or inventory. Run the ntnx_vms module using those same vars twice. I get two VMs. This is not Indempotent.

Your workflow above pushes the responsibility for providing Indempotency onto the operator, and more specifically requires them to retain state and to provide Idempotency on my VM creations I'm required to have a multi-step process.

Ansible is supposed to be a declarative language. I define the state I want and the modules check the current state and if it does not exist they set it. If I have to include the logic required to achieve that state in the playbook then that has transitioned to an imperative language and I might as well just use python.

The Ansible Developer Guide states that "Modules should typically encompass much of the logic for interacting with a resource. A lightweight wrapper around an API that does not contain much logic would likely cause users to offload too much logic into a playbook"

So yes, I can create some workarounds in playbooks to deal with idempotency at the playbook level. The simplest of which is to make a descision that all VM names are unique within my organisation and use the ntnx_vms_info module to check if the VM exists already:

- name: Check if VM already exists
  nutanix.ncp.ntnx_vms_info:
    nutanix_host: "{{ vault_nutanix_host }}"
    nutanix_username: "{{ vault_nutanix_username }}"
    nutanix_password: "{{ vault_nutanix_password }}"
    validate_certs: False
    kind: vm
    filter:
      vm_name: "{{ vm.name }}"
  register: vm_name_check

- name: Create Nutanix VM
  nutanix.ncp.ntnx_vms:
    nutanix_host: "{{ vault_nutanix_host }}"
    nutanix_username: "{{ vault_nutanix_username }}"
    nutanix_password: "{{ vault_nutanix_password }}"
    validate_certs: False
    state: "{{ vm.state | default('present') }}"
    name: "{{ vm.name }}"
    vm_uuid: "{{ vm.uuid | default(omit) }}"
    timezone: "{{ vm.timezone | default('UTC') }}"
    cluster:
      name: "{{ vm.cluster | default('HPVCLU1') }}"
    vcpus: "{{ vm.vcpus | default('2') }}"
    cores_per_vcpu: "{{ vm.cores_per_vcpu | default('1') }}"
    memory_gb: "{{ vm.memory_gb | default('2') }}"
    disks: "{{ vm.disks | default(my_default_os_disk) }}"
    guest_customization:
      type: "cloud_init"
      script_path: "./files/cloud_init.yml"
      is_overridable: True
    networks: "{{ vm.networks | default(my_default_network) }}"
  register: vm_creation_result
  when: vm_name_check.response.metadata.total_matches == 0

That would at least allow me to build VMs indempotently without requiring some method for state retention.

Or if the ntnx_vms module supported it by supplying the unique identifier for a VM. But both of these methods are pushing the responsibility for indempotency onto the operator. Which is not best practise.

Each VM needs to have a unique identifier right? So why can I not pick it? If the nutanix hypervisor doesn't support this then that limitation should be documented an an alternative provided to the operator at the module level (for example a flag to say that vm name will be used by ansible as the unique identifier, or a metadata field that the operator can supply and the module uses to perform a check for vm existence), not with runtime logic in playbooks or state retention.

There's also the issue that pushing the logic from the module into the playbook means that a single state change (build a vm in this case) is not a single task if I wanted to use async to perform the creation of VMs in a parallel loop I would not be able to do so using either your suggested workaround or mine and that would mean I would have to first run the loop to check for the existence of the VMs, save the state for each VM into a runtime fact and then use the output of that to produce a list that is the difference of the two lists and then only create the vms that exist in the last list.

So yes there is a bunch of bad options for working around the issue that the operator cannot supply a unique identifier for the VMs at the time of creation. But they are all work arounds, against Ansible Developer best practise guidelines and are more complex that simply allowing the Operator to specify a unique identifier themselves.

axielldan commented 2 years ago

I just re-read my last message and I apologise. It's a bit of a convoluted rant and comes across pretty confrontational which was not my intent. I'll try to provide a better synopsis if you want?

premkarat commented 2 years ago

Thats ok. I understand the frustration.

The problem is in this. "The simplest of which is to make a descision that all VM names are unique within my organisation and use the ntnx_vms_info module to check if the VM exists already".

Just because your organisation makes a decision to make unique VM names, it need not apply for others. When unique names cannot be ensured, then idempotency for update operation becomes impossible to achieve just with VM names. We cannot enforce uniqueness construct in Ansible when the underlying platform doesn't support it. You may want to log an enhancement request on the platform side to enable unique VM names. We can definitely revisit it then.

axielldan commented 2 years ago

That is the simplest decision for my organisation. Of course it will not work for others. But Ansible needs a unique id to able to function properly. The docs recommend this be the "name" field, but that's an ansible name it needn't be the vm name.

This same function could be achieved with a metadata field.

Looking at the options when creating a VM it seems the only option for metadata tags to nutanix VMs is the "Categories" variable.

categories:
    AppType:
        - Apache_Spark

I assumed you used this in your inventory for group membership, but you don't. So another option would be to always injects an "ansible_id" category here that is the unique id. Then your module could have two seperate fields "name" and "vm_name"

categories:
    ansible_id:
        - inventory_hostname
axielldan commented 2 years ago

"Just because your organisation makes a decision to make unique VM names, it need not apply for others" - sure, but can you at least give me the option? This is not an Or choice it's an And choice.

axielldan commented 2 years ago

You already appear to do something similar with your karbon clusters: https://github.com/nutanix/nutanix.ansible/blob/main/plugins/modules/ntnx_karbon_clusters.py

options:
    name:
        type: str
        description: Unique name of the k8s cluster.
        required: true
    cluster:
        type: dict
        description: k8s cluster  details
        suboptions:
            name:
                type: str
                description: Cluster name
            uuid:
                type: str
                description: Cluster UUID
thewelshvader commented 1 year ago

Is this issue off the table now...?

I had opened a similar question here last year.

While it's possible to workaround it with a python/ansible script to fetch a list of all VMs and compare it with the new VM name, I think it's reasonable to put this into the ansible functionality. An boolean option like this would make sense to me: _allow_duplicatenames: (true/false)(default=true)

So an example task would be: