linux-system-roles / ha_cluster

Provide automation for Cluster - High Availability management
https://linux-system-roles.github.io/ha_cluster/
MIT License
17 stars 22 forks source link

Extend platform/version execution to allow either pcs or crmsh #122

Closed sean-freeman closed 1 year ago

sean-freeman commented 1 year ago

Extend platform/version execution to allow either pcs or crmsh

By design, ha_cluster allows for additional platform/version. The first Ansible Task executed within the Ansible Role, is set_vars.yml to import different variables for OS major.minor versions.

An extension of this principle to also include/execute a different set of Ansible Tasks for different high-level command interfaces to Linux Pacemaker, would be valuable to future compatibility.

Suggestion for execution logic fork

  1. Append new var _ha_cluster_pacemaker_shell to all existing /vars/<<os_version>>.yml files. For example:

/vars/RHEL9.yml

_ha_cluster_pacemaker_shell: pcs


  1. Create new directory for Ansible Tasks of a specific shell for Linux Pacemaker. Move the existing specific files of pcs shell into /tasks/pcs directory, and create stub code for crmsh shell in future. For example:

Directory re-structure....

/tasks/pcs/cluster-start-and-reload.yml
etc.

/tasks/crmsh/cluster-start-and-reload.yml
etc.


  1. Move common Ansible Tasks that execute the underlying Pacemaker binaries cibadmin and crm_mon into a separate tasks subdirectory. For example:

Directory re-structure....

/tasks/common/create-and-push-cib.yml
/tasks/common/cluster-start-and-reload.yml


  1. Append include_tasks. For example

/tasks/main.yml

---
- name: Set platform/version specific variables
  include_tasks: set_vars.yml

...
...

- name: Start the cluster and reload corosync.conf
  include_tasks: {{ _ha_cluster_pacemaker_shell }}/cluster-start-and-reload.yml
...
...

Side benefits


Comparison of pcs and crmsh

Looking in the repository, there are a limited number of pcs shell commands in use. See equivilant below for crmsh:

# PCS Shell                   # CRMSH

pcs cluster                   crm node
pcs constraint <type>         crm configure <type>
pcs property                  crm configure property
pcs qdevice                   crm cluster init qdevice
pcs quorum                    corosync-quorumtool / corosync-qnetd-tool
pcs resource                  crm ra
pcs status                    crm status
pcs stonith                   crm ra

Refs:

richm commented 1 year ago

What about the public API variables with pcs or pcsd in the name? https://github.com/linux-system-roles/ha_cluster/blob/main/defaults/main.yml - do we need crmsh analogues for those? Or, perhaps introduce a more generic/general name for those concepts, similar to what we have done in other system roles that support multiple providers (e.g. timesync, network). Of course, we would have to keep the old names for backward compatibility.

Testing is another issue - many of the tests use the pcs command for verification - we would need another "logic fork" for the test code as well. I'll note that the network role does something similar since it supports both the initscripts and NetworkManager providers, in both the main role code and in the tests - IMO the vast majority of this work would be in the tests.

sean-freeman commented 1 year ago

Certainly agree that as many var names as possible should use generic concept terms, unless it is a specific var required for pcs or crmsh. Thankfully it is only these two 'providers' that orchestrate the underlying Linux Pacemaker binaries. Therefore it can be pretty static when compared to your other examples (time, network etc) that have many other alternatives.

The approach you outline would work. However, I would be concerned this retains the flat design of the /tasks/ directory but appends siginificant when: clauses within those Ansible Task files to run X when pcs and Y when crmsh. In addition, the variables list would bloat considerably?

I personally believe that creating the subdirectories with the Ansible Task files that are specific to pcs or crmsh would be visually clearer to the end-user and maintainers:

Appending your example, the altered directory structure would be like:

/vars/RedHat_8.yml
/vars/SLES_15.yml
...
/tasks/pcs/...
/tasks/crmsh/...
...
/tests/pcs/...
/tests/crmsh/...

Ultimately I am not particularly concerned with the approach taken to extend. I am primarily concerned with long-term development ease.

richm commented 1 year ago

Certainly agree that as many var names as possible should use generic concept terms, unless it is a specific var required for pcs or crmsh. Thankfully it is only these two 'providers' that orchestrate the underlying Linux Pacemaker binaries. Therefore it can be pretty static when compared to your other examples (time, network etc) that have many other alternatives.

The approach you outline would work. However, I would be concerned this retains the flat design of the /tasks/ directory but appends siginificant when: clauses within those Ansible Task files to run X when pcs and Y when crmsh.

Not sure where/how I implied that we should use a flat structure with a bunch of when clauses, because we should not. I think having a subdirectory for provider should suffice e.g. /tasks/pcs/ and /tasks/crmsh/

In addition, the variables list would bloat considerably?

maybe, but it is a tradeoff - will users be confused that they are setting pcs permissions e.g.

ha_cluster_pcs_permission_list:
  - type: group
    name: haclient
    allow_list:
      - grant
      - read
      - write

when they are using the crmsh provider? Maybe it isn't applicable - I don't know enough about it.

I personally believe that creating the subdirectories with the Ansible Task files that are specific to pcs or crmsh would be visually clearer to the end-user and maintainers:

* For devs, if there are any (minor) deviations in preparation/execution activities for either shell then this will be easy to contain without causing regression impacts

* For users, reading the main.yml file you would see the `when: crmsh` to perform `include_tasks: /tasks/crmsh/file.yml` etc etc.

Appending your example, the altered directory structure would be like:

/vars/RedHat_8.yml
/vars/SLES_15.yml
...
/tasks/pcs/...
/tasks/crmsh/...
...
/tests/pcs/...
/tests/crmsh/...

Ultimately I am not particularly concerned with the approach taken to extend. I am primarily concerned with long-term development ease.

sean-freeman commented 1 year ago

OK then we are on the same page, and I agree on the trade-off 👍