Closed MLotton closed 3 years ago
To compare, can you post the hammer call you used?
Yes, here is an example with a Hammer CLI command.
hammer host create \
--name example-host.example.com \
--architecture x86_64 \
--managed true \
--organization bouygues_telecom \
--location example_location \
--domain "example.com" \
--interface "managed=true,primary=true,provision=true,compute_network=Example subnet provisioning" \
--operatingsystem "CentOS 7 Latest" \
--partition-table "Kickstart default" \
--root-password "example_password" \
--provision-method image \
--image "my-vm-qcow2-image" \
--compute-resource "Example Libvirt ressource" \
--compute-profile "example-compute-profile"
To add more details, here is what I can find from /var/log/foreman/production.log
With ansible modules:
Started POST "/api/hosts" for xx.xx.xx.xx at 2021-03-05 13:52:23 +0100
Processing by Api::V2::HostsController#create as JSON
Parameters: {
"organization_id"=>1,
"host"=>
{
"operatingsystem_id"=>20,
"root_pass"=>"[FILTERED]",
"managed"=>true,
"name"=>"example-host.example.com",
"interfaces_attributes"=>[{"managed"=>true, "subnet_id"=>35, "type"=>"interface", "primar
y"=>true, "virtual"=>false, "identifier"=>"eth0", "provision"=>true, "domain_id"=>1}],
"provision_method"=>"image",
"comment"=>"Test collection",
"compute_profile_id"=>7,
"organization_id"=>1,
"image_id"=>3,
"architecture_id"=>1,
"build"=>true,
"location_id"=>2,
"domain_id"=>1, "
compute_resource_id"=>3
},
"location_id"=>2,
"apiv"=>"v2"
}
image_id is well defined, so the issue is not linked to that.
With Hammer CLI:
Started POST "/api/hosts" for xx.xx.xx.xx at 2021-03-05 13:57:19 +0100
Processing by Api::V2::HostsController#create as JSON
Parameters: {
"location_id"=>2,
"organization_id"=>1,
"host"=>
{
"name"=>"example-host.example.com",
"location_id"=>2,
"organization_id"=>1,
"architecture_id"=>1,
"domain_id"=>1,
"operatingsystem_id"=>20,
"ptable_id"=>116,
"compute_resource_id"=>3,
"image_id"=>3,
"provision_method"=>"image",
"managed"=>true,
"compute_profile_id"=>7,
"compute_attributes"=>
{
"volumes_attributes"=>{},
"image_id"=>"/var/lib/libvirt/images/my-vm-qcow2-image.qcow2"
},
"content_facet_attributes"=>{},
"subscription_facet_attributes"=>{},
"build"=>true,
"overwrite"=>true,
"interfaces_attributes"=>[{"managed"=>"true", "primary"=>"true", "provision"=>"true", "compute_attributes"=>{"network"=>"Example subnet provisioning"}}],
"root_pass"=>"[FILTERED]"
},
"apiv"=>"v2"
}
Both have one identical image_id
parameter at the root of host
. But there is another parameter also called image_id
and defined in compute_attributes
for hammer CLI.
I think this is the the cause of the difference in behavior because of the workaround I detailed in this issue. It might require further investigations. From the integrated Satellite API documentation nothing seems to be mentioned...
@tbrisker @lzap do you know if Foreman itself should do the translation of the Hosts's "image" parameter to a CR image? It feels wrong to have all API clients (hammer, FAM, etc) have to carry the same "find image and set another parameter" code around.
By translation do you mean i18n translation? I think you mean more "look up"?
Foreman keeps images in its database, this table has image_id (SQL INT), uuid and name fields (varchar) which are used by individual compute resources. Usually, both UI and API should only need to provide image_id which Foreman than uses to look other attributes up (uuid, name).
So I think this is expected, when compute resource attribute is passed (uuid or name) Foreman will merge that and pass it into the VM creation request so it actually works and this is probably a bug, but the correct way is to really find image_id first and then providing that in the request.
In general, compute attributes are implemented in a weird way and @ezr-ondrej actually started an effort to improve so poking him to take this into consideration. Maybe we should create some allow list of compute resource attributes which are allowed to be passed in from UI / CLI.
This also raises a concern about security, Rails have a mechanism to protect parameters from being sent into ActiveRecord, we do however appear to merge all attributes and pass them into fog. At least how I understand this, it looks like an attacker can actually pass image path directly escaping organization boundary and performing a DOS by overwriting an existing image owned by a different organization.
Ah, sorry. "translate" was maybe a bit wrong selection of words on my side.
If you look at the API requests above, Ansible sends:
"image_id"=>3,
While hammer sends
"image_id"=>3,
"compute_attributes"=>
{
"image_id"=>"/var/lib/libvirt/images/my-vm-qcow2-image.qcow2"
},
And it seems without the second entry, Libvirt just doesn't know what to do.
My naïve assumption was that when I pass image_id: 3
to the API, it will forward the right data to the CR, but it seems that's not the case.
I would also expect that our API will lookup the image "UUID / path" or whatever we call as image_id. That smells like a bug worth fixing.
Image selection is particulary weird as there is a select it in Host form where I select the method, but that input is actually send as host[compute_attributes][image_id]
and in code it's being rendered in the Virtual Machine tab and then moved by JS. So for user it looks fine, but it's created a mess.
Hammer has solved this to mimic this by looking the image_id attribute up and passing it to API as compute_attributes: { image_id: ... }
only this will make sure the image gets picked up during orchestration. The image_id
on host gets set only after the provisioning has been successful.
This is IMHO very wrong and I might prioritize it in the ongoing compute resources cleanup, but I can't promise any delivery upon that, as I don't fully understand the process for all the compute resources.
This being said, we should probably have fix here sooner. By doing eigher
1) do what hammer does and just lookup the image id and pass it to the compute_attributes.
2) create temporary workaround in API to handle the image_id
and do the lookup on Foreman side.
Following the discussion, ideally this would be fixed on the Foreman side but that is not imminently expected, so we could provide a workaround here similar to what hammer is doing. Therefore I'm labeling this as a bug, although it could also be 'depends on external project'
Thanks! I've opened https://projects.theforeman.org/issues/32501 to track the API side of this, and #1215 has a workaround for now.
SUMMARY
I wanted to provision a VM with a Libvirt/KVM compute ressource based on an image provision method with an existing qcow2 image. I encountered an issue where the host was created but the image provisioning wasn't correctly done by using the module
host
. The result is not the same by using the hammer CLI or the foreman/satellite web UI (with identical input parameters).Looking further and comparing with what was done by the Satellite Web UI and Hammer CLI, I found that the
host
module (python code) doesn't send theimage_id
to the API call (Note: I didn't investigate a lot, so it is a guess). I tried again by adding the parameterimage_id
directly in the module parameters (insidecompute_attributes
) and then it worked well.I guess that the Ansible module(s) / python code must perform an additional step to resolve the image_id based on the image name and send it to the API call (as it will be more user friendly if the user only has to put the image name as an input parameter).
ISSUE TYPE
ANSIBLE VERSION
COLLECTION VERSION
Repo link: https://github.com/RedHatSatellite/satellite-ansible-collection This was not tested with the community foreman collection version.
KATELLO/FOREMAN VERSION
STEPS TO REPRODUCE
EXPECTED RESULTS
The host is created and well provisioned based on a qcow2 image.
The boot device order is based only on the first disk. Note: Satellite will also create a virtual CDROM device for user data / cloud-init (like this example, below you will find the definition of this device for libvirt/kvm in a XML format):
ACTUAL RESULTS
The host is created but the VM is "blank" (no OS installed, etc).
The boot device order is first "network" and then "disk". There is no virtual CDROM device created.
WORKAROUND
By adding:
It works as expected.