sap-linuxlab / community.sap_install

Automation for SAP - Collection of Ansible Roles for various SAP software installation
Apache License 2.0
53 stars 56 forks source link

sap_ha_pacemaker_cluster - tasks/construct_vars_stonith.yml - fencing does not work #363

Closed waorb closed 1 year ago

waorb commented 1 year ago

After the execution of the role, the defined Stonith device looks like:

pcs stonith config res_fence_ibm_powervs
 Resource: res_fence_ibm_powervs (class=stonith type=fence_ibm_powervs)
  Attributes: api-type=private crn=crn:v1:bluemix:public:power-iaas:lon06:a/719e669b325b4d847c6f45401fef8bc1:0e6322f9-1b4c-4489-a5db-a9ddd73c7107:: instance=0e6482f9-1b4c-4489-a5db-a9ddd83c8107 pcmk_host_map="cl-h01-1:542127c0-fc27-4495-a62b-af81dc7bc2e6;cl-h01-2:b1068096-ade4-47e3-89fb-5a0de7091bc0" pcmk_monitor_timeout=600 pcmk_reboot_retries=4 pcmk_reboot_timeout=600 pcmk_status_timeout=60 power_timeout=240 proxy=http://10.30.10.4:3128 region=eu-gb token=**************
  Operations: monitor interval=60s (res_fence_ibm_powervs-monitor-interval-60s)

Please note the quotes around the pcmk_host_map string. With this pcmk_host_map string, the fencing operations do not work.

I use the pcs stonith command to update the stonith device:

pcs stonith update res_fence_ibm_powervs pcmk_host_map="wo-cl-h01-1:542127c0-fc27-4495-a62b-af81dc7bc2e6;wo-cl-h01-2:b1068096-ade4-47e3-89fb-5a0de7091bc"

Now the device looks like:

pcs stonith config res_fence_ibm_powervs
 Resource: res_fence_ibm_powervs (class=stonith type=fence_ibm_powervs)
  Attributes: api-type=private crn=crn:v1:bluemix:public:power-iaas:lon06:a/719e669b325b4d847c6f45401fef8bc1:0e6322f9-1b4c-4489-a5db-a9ddd73c7107:: instance=0e6482f9-1b4c-4489-a5db-a9ddd83c8107 pcmk_host_map=cl-h01-1:542127c0-fc27-4495-a62b-af81dc7bc2e6;cl-h01-2:b1068096-ade4-47e3-89fb-5a0de7091bc0 pcmk_monitor_timeout=600 pcmk_reboot_retries=4 pcmk_reboot_timeout=600 pcmk_status_timeout=60 power_timeout=240 proxy=http://10.30.10.4:3128 region=eu-gb token=**************
  Operations: monitor interval=60s (res_fence_ibm_powervs-monitor-interval-60s)

Note that in this output, there are no quotes around the pcmk_host_map string anymore. With this updated fencing device, the fencing operation works just fine.

The question is how the role manages it to end up with the quotes in the definition. The pcs stonith create (or update) command strips them out, you have to either escape the quotes or surround them with single quotes.

In construct_vars_stonith.yml, I can see this code:

            {% set map = attrs.extend([
              {
                'name': 'pcmk_host_map',
                'value': '"' + __sap_ha_pacemaker_cluster_pcmk_host_map + '"'
              }]) -%}

So the string is surrounded with additional quotes and I'm not sure whether this code is the cause for the described problem.

Thanks, Walter

waorb commented 1 year ago

I've change the code in construct_vars_stonith.yml like below and tried the role again. With that, the superfluous quotes around pcmk_host_map in the res_fence_ibm_powervs resource definition are gone and fencing works fine.

 diff construct_vars_stonith.yml construct_vars_stonith.yml.orig
49c49
<                   'value': __sap_ha_pacemaker_cluster_pcmk_host_map
---
>                   'value': '"' + __sap_ha_pacemaker_cluster_pcmk_host_map + '"'

However, I don't know whether that is really the fix, because there must have been a reason why someone added those quotes to the __sap_ha_pacemaker_cluster_pcmk_host_map string in construct_vars_stonith.yml.

ja9fuchs commented 1 year ago

Hi Walter, many thanks for testing and sharing, this is indeed an interesting observation.

The quotes are required when setting up the resource manually, otherwise the semicolon is interpreted by the shell. In case of the Ansible role it the quotes may result in superfluous extra quotes due to how the parameters are inherited and used by the underlying ha_cluster Linux System Role.

We will review the parameter parsing and correct the constructed string.

waorb commented 1 year ago

Hello Janine, that's right, the quotes are needed on that command line, otherwise the pcs stonith create command would fail. Right now, I'm constructing the variable myself, as the test environment are Power Virtual server instances in the IBM cloud and there's support script in the platform directory yet. I'm copying the code that builds the variable for your reference below:

    - name: Configure RHEL HA Add-On cluster
      when: hsr_configure_ha_addon
      block:
        - name: Create variable with pcmk host map
          ansible.builtin.set_fact:
            __sap_ha_pacemaker_cluster_pcmk_host_map: >-
              {{ nodes | join(';') }}
          vars:
            nodes:
              - "{{ hsr_node_primary }}:{{ hostvars[hsr_node_primary].pi_instance_id }}"
              - "{{ hsr_node_secondary }}:{{ hostvars[hsr_node_secondary].pi_instance_id }}"

        - name: Call role to configure RHEL HA Add-On cluster
          ansible.builtin.include_role:
            name: redhat.sap_install.sap_ha_pacemaker_cluster

Thanks, Walter

ja9fuchs commented 1 year ago

Confirmed to be fixed.