Server error due to stale (?) when creating multiple compute_profiles with VMware

parmstro commented 1 year ago

SUMMARY

FAILED! => {"changed": false, "error": {"message": "undefined method `resource_pools' for nil:NilClass"}, "msg": "Failed to show resource: HTTPError: 500 Server Error: Internal Server Error for url: https://sat.example.ca/api/compute_resources/1"}

ISSUE TYPE

Bug Report

ANSIBLE VERSION

ansible --version
ansible [core 2.14.2]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ansiblerunner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/ansiblerunner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.2 (main, May 24 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python3.11)
  jinja version = 3.1.2
  libyaml = True

COLLECTION VERSION

Collection                  Version
--------------------------- -------
amazon.aws                  6.2.0  
ansible.controller          4.4.0  
ansible.netcommon           5.1.2  
ansible.posix               1.5.4  
ansible.utils               2.10.3 
azure.azcollection          1.16.0 
community.aws               6.1.0  
community.crypto            2.14.1 
community.general           7.1.0  
community.vmware            3.7.0  
containers.podman           1.10.2 
infra.ah_configuration      1.1.1  
redhat.redhat_csp_download  1.2.2  
redhat.rhel_idm             1.11.0 
redhat.rhel_system_roles    1.21.2 
redhat.satellite            3.12.0 
redhat.satellite_operations 1.3.0  

# /usr/share/ansible/collections/ansible_collections
Collection               Version
------------------------ -------
redhat.rhel_system_roles 1.21.1

KATELLO/FOREMAN VERSION

foreman-3.5.1.19-1.el8sat.noarch

STEPS TO REPRODUCE

# var file
# compute_profiles
compute_profiles_mandatory:
  - name: "SOE_Small"
    compute_attributes:
      - compute_resource: "VMware_Lab"
        vm_attrs:
          cpus: 1
          corespersocket: 1
          memory_mb: 4096
          cluster: "NUCLab"
          # resource_pool: "Resources"
          path: "/Datacenters/example.ca/vm"
          guest_id: "rhel8_64Guest"
          hardware_version: "Default"
          memoryHotAddEnabled: true
          cpuHotAddEnabled: true
          add_cdrom: false
          boot_order:
            - "network"
            - "disk"
          scsi_controllers:
            - type: ParaVirtualSCSIController
              key: 1000
          volumes_attributes:
            0:
              thin: true
              name: "Hard disk"
              mode: "persistent"
              controller_key: 1000
              datastore: "NASAEX_VMS"
              size_gb: 65
          interfaces_attributes:
            0:
              type: "VirtualVmxnet3"
              network: "VM Network"

  - name: "SOE_Medium"
    compute_attributes:
      - compute_resource: "VMware_Lab"
        vm_attrs:
          cpus: 1
          corespersocket: 1
          memory_mb: 8192
          cluster: "NUCLab"
          # resource_pool: "Resources"
          path: "/Datacenters/example.ca/vm"
          guest_id: "rhel8_64Guest"
          hardware_version: "Default"
          memoryHotAddEnabled: true
          cpuHotAddEnabled: true
          add_cdrom: false
          boot_order:
            - "network"
            - "disk"
          scsi_controllers:
            - type: ParaVirtualSCSIController
              key: 1000
          volumes_attributes:
            0:
              thin: true
              name: "Hard disk"
              mode: "persistent"
              controller_key: 1000
              datastore: "NASAEX_VMS"
              size_gb: 100
          interfaces_attributes:
            0:
              type: "VirtualVmxnet3"
              network: "VM Network"

  - name: "SOE_Large"
    compute_attributes:
      - compute_resource: "VMware_Lab"
        vm_attrs:
          cpus: 2
          corespersocket: 1
          memory_mb: 16364
          cluster: "NUCLab"
          # resource_pool: "Resources"
          path: "/Datacenters/example.ca/vm"
          guest_id: "rhel8_64Guest"
          hardware_version: "Default"
          memoryHotAddEnabled: true
          cpuHotAddEnabled: true
          add_cdrom: false
          boot_order:
            - "network"
            - "disk"
          scsi_controllers:
            - type: ParaVirtualSCSIController
              key: 1000
          volumes_attributes:
            0:
              thin: true
              name: "Hard disk"
              mode: "persistent"
              controller_key: 1000
              datastore: "NASAEX_VMS"
              size_gb: 100
          interfaces_attributes:
            0:
              type: "VirtualVmxnet3"
              network: "VM Network"

# playbook
---
- name: "Test Task"
  hosts: sat.example.ca
  become: true
  gather_facts: true
  vars_files:
    - "whatever_you_name_the_var_file_above.yml"
    - "your_vault_file.yml"

  tasks:

  - name: "Test the specified task"
    ansible.builtin.include_tasks: roles/satellite_post/tasks/{{ test_task_name }}.yml

# task file - create_mandatory_compute_profiles.yml
---
- name: "Configure the mandatory compute profiles"
  include_tasks: ensure_compute_profile.yml
  loop: "{{ compute_profiles_mandatory }}"
  loop_control:
    loop_var: cpr
  when: "compute_profiles_mandatory is defined"

# task file - ensure_compute_profile.yml
---
- name: "Ensure the compute profile state - {{cpr.name}}"
  redhat.satellite.compute_profile:
    username: "{{ satellite_admin_username }}"
    password: "{{ satellite_admin_password }}"
    server_url: "{{ satellite_url }}"
    validate_certs: "{{ satellite_validate_certs }}"
    name: "{{ cpr.name }}"
    updated_name: "{{ cpr.updated_name | default(omit) }}"
    state: "{{ cpr.state | default(omit) }}"
    compute_attributes: "{{ cpr.compute_attributes | default(omit) }}"

EXPECTED RESULTS

No errors, all profiles created successfully.

ACTUAL RESULTS

2023-10-15 07:00:04,535 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Small', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 1, 'corespersocket': 1, 'memory_mb': 4096, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 65}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:04,564 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Medium', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 1, 'corespersocket': 1, 'memory_mb': 8192, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 100}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:04,591 p=362451 u=ansiblerunner n=ansible | included: /home/ansiblerunner/development/ansible/labbuilder2/sat/roles/satellite_post/tasks/ensure_compute_profile.yml for sat.example.ca => (item={'name': 'SOE_Large', 'compute_attributes': [{'compute_resource': 'VMware_Lab', 'vm_attrs': {'cpus': 4, 'corespersocket': 1, 'memory_mb': 8192, 'cluster': 'NUCLab', 'resource_pool': 'Resources', 'path': '/Datacenters/example.ca/vm', 'guest_id': 'rhel8_64Guest', 'hardware_version': 'Default', 'memoryHotAddEnabled': True, 'cpuHotAddEnabled': True, 'add_cdrom': False, 'boot_order': ['network', 'disk'], 'scsi_controllers': [{'type': 'ParaVirtualSCSIController', 'key': 1000}], 'volumes_attributes': {0: {'thin': True, 'name': 'Hard disk', 'mode': 'persistent', 'controller_key': 1000, 'datastore': 'NASAEX_VMS', 'size_gb': 100}}, 'interfaces_attributes': {0: {'type': 'VirtualVmxnet3', 'network': 'VM Network'}}}}]})
2023-10-15 07:00:06,608 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Small] *************************************************************************************************************
2023-10-15 07:00:06,608 p=362451 u=ansiblerunner n=ansible | changed: [sat.example.ca]
2023-10-15 07:01:06,983 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Wait on API background refresh] ***************************************************************************************************************************
2023-10-15 07:01:06,983 p=362451 u=ansiblerunner n=ansible | ok: [sat.example.ca]
2023-10-15 07:01:08,215 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Medium] ************************************************************************************************************
2023-10-15 07:01:08,216 p=362451 u=ansiblerunner n=ansible | changed: [sat.example.ca]
2023-10-15 07:02:08,586 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Wait on API background refresh] ***************************************************************************************************************************
2023-10-15 07:02:08,586 p=362451 u=ansiblerunner n=ansible | ok: [sat.example.ca]
2023-10-15 07:02:09,585 p=362451 u=ansiblerunner n=ansible | TASK [satellite_post : Ensure the compute profile state - SOE_Large] *************************************************************************************************************
2023-10-15 07:02:09,585 p=362451 u=ansiblerunner n=ansible | fatal: [sat.example.ca]: FAILED! => {"changed": false, "error": {"message": "undefined method `resource_pools' for nil:NilClass"}, "msg": "Failed to show resource: HTTPError: 500 Server Error: Internal Server Error for url: https://sat.example.ca/api/compute_resources/1"}

NOTE: If I call the cache_refresh api before creating the profiles, all is good. However, I have run into the problem that trying to query the Satellite to get the info so I can call a cache refresh can also run into the error. This is a real bummer when you are an hour and a half in on a build and things bomb out. I am currently testing with the compute resource created with the cache off.

It would be nice to have a cache refresh embedded in the background as this is an automation task and not a UI thing. No one is waiting and watching.

evgeni commented 1 year ago

What is the cache refresh API?! :)

parmstro commented 1 year ago

- name: "Get the compute resource id"
  redhat.satellite.compute_resource:
    username: "{{ satellite_admin_username }}"
    password: "{{ satellite_admin_password }}"
    server_url: "{{ satellite_url }}"
    validate_certs: "{{ satellite_validate_certs }}"
    name: "{{ cpr.compute_attributes[0].compute_resource }}"
    state: "present"
  register: result

- ansible.builtin.set_fact:
    cr_id: "{{ result.entity.compute_resources[0].id }}"

# you can change the name of the compute profile by simply passing name and updated_name
- name: "Force refresh of Compute Resource API cache"
  ansible.builtin.uri:
    url: "{{ satellite_url }}/api/compute_resources/{{ cr_id }}-{{ cpr.compute_attributes[0].compute_resource }}/refresh_cache"
    method: "PUT"
    body_format: "json"
    user: "{{ satellite_admin_username }}"
    password: "{{ satellite_admin_password }}"
    force_basic_auth: true
    validate_certs: "{{ satellite_validate_certs }}"
  register: refresh_result

- debug:
    var: refresh_result.json.message

evgeni commented 1 year ago

Wait, you create a fresh CR and after that the cache is invalid? That sounds like a Foreman bug, not something we should (have to) workaround in FAM.

Interestingly, https://github.com/theforeman/foreman/blob/develop/app/models/concerns/compute_resource_caching.rb only calls a refresh automatically after_update, but not after_create (or after_save which would contain both). The original refresh was added in https://projects.theforeman.org/issues/19506 / https://github.com/theforeman/foreman/pull/4524

Wonder what @ares thinks about this.

ares commented 1 year ago

That patch was create to only solve the issue of updating the CR. I'm not sure how the cache could be invalid right after the CR creation, but if it helps, I think replacing after_update with after_save is a good move. There was no higher logic in why we don't do that after the CR creation, it just felt unnecessary.

evgeni commented 1 year ago

Yeah, I am too curious how this ended up with a "bad" cache, but here we are.

@parmstro if the issue is sufficiently reproducible in your env, could you try patching it to use after_save instead of after_update and see if it makes anything better?

parmstro commented 1 year ago

Yes. I will patch to use after_save and set caching_enabled to true for my next test run. Please Note: with only caching_enabled, and no code to call cache_refresh, the builder creates the CR and gets through a couple of CPs then emits the error. This is very reproducible. with caching_enabled, and the code to call cache_refresh (code queries the API to get the ID of the compute resources so that we can use it in the call), this errors right after the call that creates the CR.

evgeni commented 1 year ago

Could you by any chance provide access to a reproducer system?

parmstro commented 1 year ago

The systems are built and torn down constantly. I would have to spin you up one, but that can be done. Let me see if I can work on it. I am in the middle of a test right now with caching_enabled: false .. The environment build is past the compute_resource creation and is actively using it to build systems. I am switching it back for the next run and creating the edit that you requested in comment 5. Currently building tang hosts, so environment should be finished in about 90 minutes or so.

theforeman / foreman-ansible-modules