theforeman / foreman-ansible-modules

Ansible modules for interacting with the Foreman API and various plugin APIs such as Katello
GNU General Public License v3.0
148 stars 166 forks source link

Unable to provision a host (with Libvirt/KVM) based on a qcow2 image without explicitly setting the image_id parameter. #1160

Closed MLotton closed 3 years ago

MLotton commented 3 years ago
SUMMARY

I wanted to provision a VM with a Libvirt/KVM compute ressource based on an image provision method with an existing qcow2 image. I encountered an issue where the host was created but the image provisioning wasn't correctly done by using the module host. The result is not the same by using the hammer CLI or the foreman/satellite web UI (with identical input parameters).

Looking further and comparing with what was done by the Satellite Web UI and Hammer CLI, I found that the host module (python code) doesn't send the image_id to the API call (Note: I didn't investigate a lot, so it is a guess). I tried again by adding the parameter image_id directly in the module parameters (inside compute_attributes) and then it worked well.

I guess that the Ansible module(s) / python code must perform an additional step to resolve the image_id based on the image name and send it to the API call (as it will be more user friendly if the user only has to put the image name as an input parameter).

ISSUE TYPE
ANSIBLE VERSION
ansible 2.9.13
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/max/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/max/deploy/venvs_python/ansible_epdb_python_2/lib/python2.7/site-packages/ansible
  executable location = /home/max/deploy/venvs_python/ansible_epdb_python_2/bin/ansible
  python version = 2.7.5 (default, Aug 13 2020, 02:51:10) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
COLLECTION VERSION
redhat.satellite:2.0.1

Repo link: https://github.com/RedHatSatellite/satellite-ansible-collection This was not tested with the community foreman collection version.

KATELLO/FOREMAN VERSION
tfm-rubygem-katello-3.16.0.13-1.el7sat.noarch
foreman-2.1.2.21-1.el7sat.noarch
STEPS TO REPRODUCE
---
- hosts: localhost
  gather_facts: False
  tasks:

    - include_vars: input_variables/vault_secrets.yml

    - name: "Create Satellite host"
      theforeman.foreman.host:
        name: "example-host.example.com"
        architecture: "x86_64"
        build: True
        comment: "Test collection"
        domain: "example.com"
        interfaces_attributes:
          - subnet: "Example subnet provisioning"
            identifier: eth0
            type: interface
            domain: "example.com"
            managed: True
            primary: True
            provision: True
        location: "example_location"
        operatingsystem: "CentOS 7 Latest"
        organization: "example_organization"
        root_pass: "example_password"
        server_url: "https://satellite-host"
        username: "example_user"
        password: "{{ satellite_user_password }}"
        validate_certs: False
        state: present
        provision_method: image
        image: "my-vm-qcow2-image"
        compute_resource: "Example Libvirt ressource"
        compute_profile: "example-compute-profile"
EXPECTED RESULTS

The host is created and well provisioned based on a qcow2 image.

The boot device order is based only on the first disk. Note: Satellite will also create a virtual CDROM device for user data / cloud-init (like this example, below you will find the definition of this device for libvirt/kvm in a XML format):

<disk type="file" device="cdrom">
  <driver name="qemu" type="raw"/>
  <source file="/var/lib/libvirt/images/example-host.example.com-cloud-init.iso"/>
  <backingStore/>
  <target dev="hdc" bus="ide"/>
  <readonly/>
  <alias name="ide0-1-0"/>
  <address type="drive" controller="0" bus="1" target="0" unit="0"/>
</disk>
ACTUAL RESULTS

The host is created but the VM is "blank" (no OS installed, etc).

The boot device order is first "network" and then "disk". There is no virtual CDROM device created.

WORKAROUND
---
- hosts: localhost
  gather_facts: False
  tasks:

    - include_vars: input_variables/vault_secrets.yml

    - name: "Create Satellite host"
      theforeman.foreman.host:
        name: "example-host.example.com"
        architecture: "x86_64"
        build: True
        comment: "Test collection"
        domain: "example.com"
        interfaces_attributes:
          - subnet: "Example subnet provisioning"
            type: interface
            domain: "example.com"
            managed: True
            primary: True
            provision: True
        location: "example_location"
        operatingsystem: "CentOS 7 Latest"
        organization: "example_organization"
        root_pass: "example_password"
        server_url: "https://satellite-host"
        username: "example_user"
        password: "{{ satellite_user_password }}"
        validate_certs: False
        state: present
        provision_method: image
        image: "my-vm-qcow2-image"
        compute_resource: "Example Libvirt ressource"
        compute_profile: "example-compute-profile"
        compute_attributes:
          image_id: "/var/lib/libvirt/images/my-vm-qcow2-image.qcow2"

By adding:

compute_attributes:
  image_id: "/var/lib/libvirt/images/my-vm-qcow2-image.qcow2"

It works as expected.

evgeni commented 3 years ago

To compare, can you post the hammer call you used?

MLotton commented 3 years ago

Yes, here is an example with a Hammer CLI command.

Host creation with hammer CLI
hammer host create \
--name example-host.example.com \
--architecture x86_64 \
--managed true \
--organization bouygues_telecom \
--location example_location \
--domain "example.com" \
--interface "managed=true,primary=true,provision=true,compute_network=Example subnet provisioning" \
--operatingsystem "CentOS 7 Latest" \
--partition-table "Kickstart default" \
--root-password "example_password" \
--provision-method image \
--image "my-vm-qcow2-image" \
--compute-resource "Example Libvirt ressource" \
--compute-profile "example-compute-profile"
API results - examples

To add more details, here is what I can find from /var/log/foreman/production.log

With ansible modules:

Started POST "/api/hosts" for xx.xx.xx.xx at 2021-03-05 13:52:23 +0100
Processing by Api::V2::HostsController#create as JSON
Parameters: {
"organization_id"=>1, 
"host"=>
{
"operatingsystem_id"=>20, 
"root_pass"=>"[FILTERED]", 
"managed"=>true, 
"name"=>"example-host.example.com", 
"interfaces_attributes"=>[{"managed"=>true, "subnet_id"=>35, "type"=>"interface", "primar
y"=>true, "virtual"=>false, "identifier"=>"eth0", "provision"=>true, "domain_id"=>1}], 
"provision_method"=>"image", 
"comment"=>"Test collection", 
"compute_profile_id"=>7, 
"organization_id"=>1, 
"image_id"=>3, 
"architecture_id"=>1, 
"build"=>true, 
"location_id"=>2, 
"domain_id"=>1, "
compute_resource_id"=>3
}, 
"location_id"=>2, 
"apiv"=>"v2"
}

image_id is well defined, so the issue is not linked to that.

With Hammer CLI:

Started POST "/api/hosts" for xx.xx.xx.xx at 2021-03-05 13:57:19 +0100
Processing by Api::V2::HostsController#create as JSON
Parameters: {
"location_id"=>2, 
"organization_id"=>1, 
"host"=>
{
"name"=>"example-host.example.com", 
"location_id"=>2, 
"organization_id"=>1, 
"architecture_id"=>1, 
"domain_id"=>1, 
"operatingsystem_id"=>20, 
"ptable_id"=>116, 
"compute_resource_id"=>3, 
"image_id"=>3, 
"provision_method"=>"image", 
"managed"=>true, 
"compute_profile_id"=>7, 
"compute_attributes"=>
{
"volumes_attributes"=>{}, 
"image_id"=>"/var/lib/libvirt/images/my-vm-qcow2-image.qcow2"
}, 
"content_facet_attributes"=>{}, 
"subscription_facet_attributes"=>{}, 
"build"=>true, 
"overwrite"=>true, 
"interfaces_attributes"=>[{"managed"=>"true", "primary"=>"true", "provision"=>"true", "compute_attributes"=>{"network"=>"Example subnet provisioning"}}], 
"root_pass"=>"[FILTERED]"
}, 
"apiv"=>"v2"
}

Both have one identical image_id parameter at the root of host. But there is another parameter also called image_id and defined in compute_attributes for hammer CLI. I think this is the the cause of the difference in behavior because of the workaround I detailed in this issue. It might require further investigations. From the integrated Satellite API documentation nothing seems to be mentioned...

evgeni commented 3 years ago

@tbrisker @lzap do you know if Foreman itself should do the translation of the Hosts's "image" parameter to a CR image? It feels wrong to have all API clients (hammer, FAM, etc) have to carry the same "find image and set another parameter" code around.

lzap commented 3 years ago

By translation do you mean i18n translation? I think you mean more "look up"?

Foreman keeps images in its database, this table has image_id (SQL INT), uuid and name fields (varchar) which are used by individual compute resources. Usually, both UI and API should only need to provide image_id which Foreman than uses to look other attributes up (uuid, name).

So I think this is expected, when compute resource attribute is passed (uuid or name) Foreman will merge that and pass it into the VM creation request so it actually works and this is probably a bug, but the correct way is to really find image_id first and then providing that in the request.

In general, compute attributes are implemented in a weird way and @ezr-ondrej actually started an effort to improve so poking him to take this into consideration. Maybe we should create some allow list of compute resource attributes which are allowed to be passed in from UI / CLI.

This also raises a concern about security, Rails have a mechanism to protect parameters from being sent into ActiveRecord, we do however appear to merge all attributes and pass them into fog. At least how I understand this, it looks like an attacker can actually pass image path directly escaping organization boundary and performing a DOS by overwriting an existing image owned by a different organization.

evgeni commented 3 years ago

Ah, sorry. "translate" was maybe a bit wrong selection of words on my side.

If you look at the API requests above, Ansible sends:

"image_id"=>3, 

While hammer sends

"image_id"=>3, 
"compute_attributes"=>
{
"image_id"=>"/var/lib/libvirt/images/my-vm-qcow2-image.qcow2"
}, 

And it seems without the second entry, Libvirt just doesn't know what to do.

My naïve assumption was that when I pass image_id: 3 to the API, it will forward the right data to the CR, but it seems that's not the case.

lzap commented 3 years ago

I would also expect that our API will lookup the image "UUID / path" or whatever we call as image_id. That smells like a bug worth fixing.

ezr-ondrej commented 3 years ago

Image selection is particulary weird as there is a select it in Host form where I select the method, but that input is actually send as host[compute_attributes][image_id] and in code it's being rendered in the Virtual Machine tab and then moved by JS. So for user it looks fine, but it's created a mess. Hammer has solved this to mimic this by looking the image_id attribute up and passing it to API as compute_attributes: { image_id: ... } only this will make sure the image gets picked up during orchestration. The image_id on host gets set only after the provisioning has been successful.

This is IMHO very wrong and I might prioritize it in the ongoing compute resources cleanup, but I can't promise any delivery upon that, as I don't fully understand the process for all the compute resources.

This being said, we should probably have fix here sooner. By doing eigher 1) do what hammer does and just lookup the image id and pass it to the compute_attributes. 2) create temporary workaround in API to handle the image_id and do the lookup on Foreman side.

wbclark commented 3 years ago

Following the discussion, ideally this would be fixed on the Foreman side but that is not imminently expected, so we could provide a workaround here similar to what hammer is doing. Therefore I'm labeling this as a bug, although it could also be 'depends on external project'

evgeni commented 3 years ago

Thanks! I've opened https://projects.theforeman.org/issues/32501 to track the API side of this, and #1215 has a workaround for now.