redhat-cop / infra.leapp

Collection of Ansible roles for automating RHEL in-place upgrades using Leapp.
MIT License
44 stars 34 forks source link

Post upgrade does not provide a method to "lock" a node to a version of RHEL #134

Closed scott-vick closed 4 months ago

scott-vick commented 11 months ago

When running the post-upgrade for RHEL 7 -> 8, the leapp-post-upgrade role does not have a method to lock the version to a specific release. If I want to upgrade from RHEL 7 -> 8.6 the post-upgrade process upgrades to RHEL 8.8 which is undesirable.

The leapp upgrade allows for a target version for the upgrade to be specified (e.g. "--target 8.6") but it does not allow for the target to remain at the specified release.

swapdisk commented 10 months ago

Hi Scott, this seems like a reasonable feature to add. Are you planning to work on this and bring us a PR? Thanks in advance! :slightly_smiling_face:

heatmiser commented 7 months ago

@scott-vick This functionality would need to operate slightly differently depending on which package manager repository method is being utilized. Essentially to utilize the specific target feature (e.g. "--target 8.6"), you are going to need to have a means for the target node to have repositories with the desired target version enabled (e.g. "8.6").

Have you tried pinning the release version for RHSM or RHUI?

We can implement version pinning via Satellite...referring to these example variables for the upgrade role: satellite_activation_key_pre_leapp: RHEL7_Dev satellite_activation_key_leapp: RHEL7IPU_Dev satellite_activation_key_post_leapp: RHEL8_Dev ...the satellite_activation_key_leapp activation key will be associated with a content view that contains the current RHEL release version repositories (e.g. "7.9"), as well as the desired upgrade target RHEL version repositories (e.g. "8.6"). And then to keep a system pinned to that target RHEL version, the satellite_activation_key_post_leapp variable would be set to an activation key/content view that has repositories defined for the final target version, in this case we would define the same "8.6" repositories.

Denney-tech commented 5 months ago

I also would like to stop the leapp.upgrade role from unsetting the target release and then following up with another update to latest release. I'm trying to perform a leapp from 7.9 to 8.8 to 9.2, but it 7.9 to 8.8 to 8.9 to 9.2 is invalid because 8.9's only valid leapp target is 9.3 (as of this writing). As is stands, I can leapp to 8.8 just fine, but then unexpectedly end up on 8.9 afterwards.

@heatmiser pinning the target version with RHSM doesn't work for a couple of reasons:

  1. It prevents prerequisite packages from installing for the current release.
  2. It gets unset here, which is the root of the problem: https://github.com/redhat-cop/infra.leapp/blob/main/roles/upgrade/tasks/leapp-post-upgrade.yml#L63

The post-leapp-release-upgrades should be optional, not forced. I have hosts that need to be pinned to the release I'm targeting, so this behavior is both unexpected and undesired.

Denney-tech commented 5 months ago

I have submitted a draft PR #159 for consideration.

heatmiser commented 5 months ago

@Denney-tech What kind of repo backing stores are utilized in your use case? leapp_upgrade_unset_release could be set to false...rhui backed systems are not currently covered, but support could be added. Currently, many have standardized automation workflows on leapp_preupg_opts and leapp_upgrade_opts to be where target versions are specified. Please provide details on your package repo mode in use and let's see what we can work out for all.

Denney-tech commented 5 months ago

I am using RH Satellite with a generic Content View with all the basic RHEL repositories that are needed by all hosts (for RHEL 7, 8, and 9, including repositories pinned to minor releases). Then that is added to Composite Views for hosts that need additional content. Some hosts are pinned to LTS releases of RHEL, and must be pinned to a future LTS release after a leapp upgrade.

As an example, I have a RHEL 7.9 SAP SPS05 host that I need to upgrade to 9.2. This must upgrade to 8.8 first, and then 9.2. If it upgrades to 8.9, then 9.2 becomes an invalid target. At that point, 9.3 becomes the only valid target and is unsupported by SAP, not to mention that 8.9 is also unsupported by SAP. My workaround is to comment out the task that unsets the release altogether in my locally installed collection (which I'm fine with for the moment, since I also changed the references of fedora.linux_system_roles to redhat.rhel_system_roles already).

As for my PR, it wouldn't affect existing workflows based on leapp_preupg_opts or leapp_upgrade_opts, but it would break if someone used the new variable together with "--target x.y" in the *_opts paths.

example playbook:

---
- name: Leapp analysis and upgrade
  hosts: all
  strategy: free
  gather_facts: true
#  become: true
  force_handlers: true
  remote_user: root

  vars:
    upgrade_target: "{{ '8.8' if ansible_distribution_major_version == '7' else '9.2' }}"
    leapp_preupg_opts: "--target {{ upgrade_target }} --no-insights-register"
    leapp_upgrade_opts: "--target {{ upgrade_target }} --no-insights-register"
    leapp_answerfile: |
      [remove_pam_pkcs11_module_check]
      confirm = True
      [authselect_check]
      confirm = True
    os_path: $PATH
  pre_tasks:
    - name: Abort on RHEL 9 hosts
      ansible.builtin.meta: end_host
      when: ansible_distribution_major_version == "9"
      tags:
        - always

    - name: Remove unsupported drivers
      community.general.modprobe:
        name: "{{ item }}"
        state: absent
      loop:
        - floppy
        - pata_acpi
      tags:
        - always
      when: ansible_distribution_major_version == "7"

    - name: Unmount NFS Home share
      ansible.posix.mount:
        path: /home
        state: unmounted
      tags:
        - always
    - name: Disable NFS Mount in fstab
      ansible.builtin.replace:
        path: /etc/fstab
        regexp: '^nfsserver'
        replace: '#nfsserver'
      tags:
        - always
      notify: daemon-reload

# need to be updated to 7.9
    - name: Unset Release Target
      community.general.rhsm_release:
#        release: "{{ upgrade_target }}"
        release: null
      when: ansible_distribution_major_version == "7"
      tags:
        - always

  tasks:
    - name: Generate preupgrade analysis report
      ansible.builtin.include_role:
        name: infra.leapp.analysis
        apply:
          tags:
            - analysis
      tags:
        - analysis

    - name: Flush Analysis Handlers
      ansible.builtin.meta: flush_handlers
      tags:
        - analysis

    - name: Perform OS upgrade
      ansible.builtin.include_role:
        name: infra.leapp.upgrade
        apply:
          tags:
            - upgrade
      tags:
        - upgrade

    - name: Flush Upgrade Handlers
      ansible.builtin.meta: flush_handlers
      tags:
        - upgrade

    - name: Enable NFS Mount in fstab
      ansible.builtin.replace:
        path: /etc/fstab
        regexp: '^#nfsserver'
        replace: 'nfsserver'
      tags: always
      notify: daemon-reload

  post_tasks:
    - name: Remount fstab
      ansible.builtin.command:
        cmd: mount -a
      tags:
        - always
      changed_when: true

  handlers:
    - name: daemon-reload
      ansible.builtin.systemd:
        daemon_reload: true
      tags:
        - always
heatmiser commented 5 months ago

@Denney-tech Utilizing a generic Content View with all the basic RHEL repositories that are needed by all hosts (for RHEL 7, 8, and 9, including repositories pinned to minor releases) ...aka a "kitchen sink" CV within Satellite, while a viable option in the past, has become more of an anti-pattern today. In the past, we did not have access to Ansible and Config-as-Code with Satellite, so maintaining a smaller set of Content Views that contained larger sets of OS and application code that could be applied to all possible systems was desirable. However, today, with Ansible and Config-as-Code available, we can define CVs and/or CCVs, with associated AKs that hone in on more discrete sets of repositories and the administration effort to do so is practically non-existent.

The upgrade scenario you outlined can be achieved with the current collection code base as it stands, if you utilize a more discrete set of CVs and CCVs...I know this because I've implemented this myself and can confirm it works as advertised. The question then becomes, do we implement a change to adjust to the kitchen sink CV model? Or continue to use the method in which we know works with utilizing discrete CVs/CCVs?

djdanielsson commented 5 months ago

After thinking about it some I think there is a simple solution to this, we need to add a when statement after the upgrade to skip running a dnf update. I think that is causing the issues that are being reported.

heatmiser commented 5 months ago

we need to add a when statement after the upgrade to skip running a dnf update Can you elaborate a bit more? Again, as things currently stand, if you utilize CVs/CCVs that are specific to set releases of RHEL, all of this is moot. And you can end up landing on a version of RHEL that is beyond the supported leapp target release versions...or not, it all depends on what you set for satellite_activation_key_post_leapp and the version of RHEL that the associated CV/CCV contains.

Denney-tech commented 5 months ago

I think options to skip both would be nice.

heatmiser commented 5 months ago

You skip the upgrade by setting satellite_activation_key_leapp and satellite_activation_key_post_leapp to the same CV/CCV...which is selected via the associated AKs.

Denney-tech commented 5 months ago

@heatmiser Please be considerate that there are Red Hat customers that have made different design decisions with regards to their Satellite infrastructure, where your proposed workaround is not easily met. Especially when accommodating them can be addressed with a trivial amount of code. I would rather grab hot iron with my bare fists, but it wouldn't melt from my clutch, Mr. Sun.

Seriously, can we move on? I would happily discuss Satellite design with you elsewhere, but this is just counterproductive.

heatmiser commented 5 months ago

@Denney-tech No disrespect intended, simply trying to suggest improvements to existing Satellite configuration decisions. My intentions are to help customers advance each and every day, in the hopes that overall administrative burden is lessened. It was my mistake for assuming that defining a CV/CCV and associated AK for specific RHEL releases was trivial. Apologies, if you feel that my comments and suggestions to be counterproductive.

It appears that @djdanielsson has proposed a simple modification that should work for all scenarios.

Denney-tech commented 5 months ago

Thank you, and I apologize as well. Really didn't want to get off on the wrong foot here.

The problem for me is not whether or not I can perform a major Leapp upgrade with my CV/AK selection with this collection, and it's not about whether or not I can restrict minor release upgrades with those same CV/AK's. It's about the fact that I may not necessarily want to unset the release in RHSM or perform a (potentially unecessary or redundant) minor release upgrade after the actual Leapp upgrade is complete. Even if my CV's would prevent an errant minor upgrade, nothing I do on Satellite is going to prevent the release from being unset. That is the root of #134. Doing minor release upgrades after the major upgrade is certainly a welcome feature and unsetting (or changing) the release target is a necessary step, but these should be optional post upgrade steps.

Denney-tech commented 5 months ago

@heatmiser @djdanielsson I think maybe a new PR is in order? I think there's 3 related QoL changes to address here.

This is what I'm thinking:

post_upgrade_unset_release: true # default post_upgrade_update: true # default post_upgrade_release: "" # default

The two booleans address toggling current behavior, and keeps the current behavior default for existing workflows. It also allows for @scott-vick's original request to leave the Leapp target lock in place.

The post_upgrade_release would be for an optional additional task that meets @jeffmcutter's request in #159. This would allow the RHSM release target to be set to something other than the leapp target after the upgrade, but before the update. I think this would still be in line with this issue, and actually defines what lock is set rather than just leaving it to Leapp.

Thoughts?

djdanielsson commented 5 months ago

I think we leave them their own PRs and can debate each parts value and the impact to the current code

Denney-tech commented 5 months ago

Ready for review

jeffmcutter commented 4 months ago

https://github.com/redhat-cop/infra.leapp/pull/159 was merged.