sap-linuxlab / community.sap_install

Automation for SAP - Collection of Ansible Roles for various SAP software installation
Apache License 2.0
53 stars 56 forks source link

sap_ha_pacemaker_cluster: Error when trying to create resource definition (NetWeaver common filesystem) #661

Open ZouhirYachou opened 8 months ago

ZouhirYachou commented 8 months ago

After the installation of SAP ASCS and ERS with swpm role, I'm trying to run the sap_ha_pacemaker_cluster role with the following variables for the NAS configuration:

sap_ha_pacemaker_cluster_storage_nfs_server: "{{ nfs_server }}"
sap_ha_pacemaker_cluster_nwas_abap_sid: "{{ sid | upper }}"
sap_ha_pacemaker_cluster_nwas_abap_ascs_instance_nr: "{{ ascs_instance_number }}"
sap_ha_pacemaker_cluster_nwas_abap_ers_instance_nr: "{{ ers_instance_number }}"
sap_ha_pacemaker_cluster_host_type:
  - nwas_abap_ascs_ers
sap_ha_pacemaker_cluster_nwas_shared_filesystems_cluster_managed: true
sap_ha_pacemaker_cluster_storage_definition:
- mountpoint: "/usr/sap/{{ sap_ha_pacemaker_cluster_nwas_abap_sid }}/SYS"
  name: usr_sap_sys
  nfs_path: /vol1
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"
- mountpoint: /usr/sap/trans
  name: usr_sap_trans
  nfs_path: /vol2
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"
- mountpoint: "/sapmnt/{{ sid | upper }}"
  name: sapmnt
  nfs_path: /vol3
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"
- mountpoint: "/usr/sap/{{ sap_ha_pacemaker_cluster_nwas_abap_sid }}/ASCS{{ sap_ha_pacemaker_cluster_nwas_abap_ascs_instance_nr }}"
  name: usr_sap_ascs
  nfs_path: /vol4
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"
- mountpoint: "/usr/sap/{{ sap_ha_pacemaker_cluster_nwas_abap_sid }}/ERS{{ sap_ha_pacemaker_cluster_nwas_abap_ers_instance_nr }}"
  name: usr_sap_ers
  nfs_path: /vol5
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"

The NFS mounts are already present on the hosts when I execute the sap_ha_pacemaker_cluster role

But I get the error on this task https://github.com/sap-linuxlab/community.sap_install/blob/1.4.0/roles/sap_ha_pacemaker_cluster/tasks/construct_vars_nwas_common.yml#L20

The error occurs when the loop reaches /usr/sap/SID/ASCS01 and /usr/sap/SID/ERS02 mount points:

TASK [community.sap_install.sap_ha_pacemaker_cluster : SAP HA Prepare Pacemaker - Include variable construction for SAP NetWeaver common] ***
included: /runner/requirements_collections/ansible_collections/community/sap_install/roles/sap_ha_pacemaker_cluster/tasks/construct_vars_nwas_common.yml for ers_node, ascs_node

TASK [community.sap_install.sap_ha_pacemaker_cluster : SAP HA Prepare Pacemaker - Define resource defaults for NetWeaver clusters] ***
ok: [ers_node]
ok: [ascs_node]

TASK [community.sap_install.sap_ha_pacemaker_cluster : SAP HA Prepare Pacemaker - Add NetWeaver common filesystem resources to resource definition] ***
ok: [ers_node] => (item=usr_sap_sys)
ok: [ascs_node] => (item=usr_sap_sys)
ok: [ers_node] => (item=usr_sap_trans)
ok: [ascs_node] => (item=usr_sap_trans)
ok: [ers_node] => (item=sapmnt)
fatal: [ers_node]: FAILED! => 
{
    "msg": "
    The conditional check '__resource_filesystem.id not in (__sap_ha_pacemaker_cluster_resource_primitives | map(attribute='id'))' failed.
    The error was: error while evaluating conditional (__resource_filesystem.id not in (__sap_ha_pacemaker_cluster_resource_primitives | map(attribute='id'))):
    {'id': \\"
    {%- if '/sapmnt' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sapmnt_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/trans' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_transports_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/' + sap_ha_pacemaker_cluster_nwas_abap_sid + '/SYS' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sys_filesystem_resource_name %
    }\\\\n{% endif %
    }\\\\n{{ idname }}
    'agent': 'ocf:heartbeat:Filesystem', 'instance_attrs': [
        {'attrs': [
                {'name': 'device', 'value': '{
                        { __nfs_server
                        }
                    }/{
                        { __nfs_path
                        }
                    }'
                },
                {'name': 'directory', 'value': '{
                        { __mountpoint
                        }
                    }'
                },
                {'name': 'fstype', 'value': '{
                        { __fstype
                        }
                    }'
                },
                {'name': 'options', 'value': '{
                        { __mount_opts
                        }
                    }'
                },
                {'name': 'force_unmount', 'value': '{
                        { sap_ha_pacemaker_cluster_resource_filesystem_force_unmount
                        }
                    }'
                }
            ]
        }
    ], 'operations': [
        {'action': 'start', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 60
                }
            ]
        },
        {'action': 'stop', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 120
                }
            ]
        },
        {'action': 'monitor', 'attrs': [
                {'name': 'interval', 'value': 200
                },
                {'name': 'timeout', 'value': 40
                }
            ]
        }
    ]
    }:
    'idname' is undefined. 'idname' is undefined.
    {'id': \\"{%- if '/sapmnt' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sapmnt_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/trans' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_transports_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/' + sap_ha_pacemaker_cluster_nwas_abap_sid + '/SYS' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sys_filesystem_resource_name %
    }\\\\n{% endif %
    }\\\\n{{ idname }}
    \\",
    'agent': 'ocf:heartbeat:Filesystem', 'instance_attrs': [
        {'attrs': [
                {'name': 'device', 'value': '{
                        { __nfs_server
                        }
                    }/{
                        { __nfs_path
                        }
                    }'
                },
                {'name': 'directory', 'value': '{
                        { __mountpoint
                        }
                    }'
                },
                {'name': 'fstype', 'value': '{
                        { __fstype
                        }
                    }'
                },
                {'name': 'options', 'value': '{
                        { __mount_opts
                        }
                    }'
                },
                {'name': 'force_unmount', 'value': '{
                        { sap_ha_pacemaker_cluster_resource_filesystem_force_unmount
                        }
                    }'
                }
            ]
        }
    ], 'operations': [
        {'action': 'start', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 60
                }
            ]
        },
        {'action': 'stop', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 120
                }
            ]
        },
        {'action': 'monitor', 'attrs': [
                {'name': 'interval', 'value': 200
                },
                {'name': 'timeout', 'value': 40
                }
            ]
        }
    ]
    }:
    'idname' is undefined.
    'idname' is undefined\\n\\n
    The error appears to be in '/runner/requirements_collections/ansible_collections/community/sap_install/roles/sap_ha_pacemaker_cluster/tasks/construct_vars_nwas_common.yml':
    line 20, column 3, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\n
    The offending line appears to be:\\n\\n\\n- name: \\"SAP HA Prepare Pacemaker - Add NetWeaver common filesystem resources to resource definition\\"\\n  ^ here\\n"
}

ok: [ascs_node] => (item=sapmnt)

fatal: [ascs_node]: FAILED! =>
{
    "msg": "
    The conditional check '__resource_filesystem.id not in (__sap_ha_pacemaker_cluster_resource_primitives | map(attribute='id'))' failed.
    The error was: error while evaluating conditional (__resource_filesystem.id not in (__sap_ha_pacemaker_cluster_resource_primitives | map(attribute='id'))):
    {'id': \\"{%- if '/sapmnt' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sapmnt_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/trans' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_transports_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/' + sap_ha_pacemaker_cluster_nwas_abap_sid + '/SYS' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sys_filesystem_resource_name %
    }\\\\n{% endif %
    }\\\\n{
        { idname
        }
    }\\",
    'agent': 'ocf:heartbeat:Filesystem', 'instance_attrs': [
        {'attrs': [
                {'name': 'device', 'value': '{
                        { __nfs_server
                        }
                    }/{
                        { __nfs_path
                        }
                    }'
                },
                {'name': 'directory', 'value': '{
                        { __mountpoint
                        }
                    }'
                },
                {'name': 'fstype', 'value': '{
                        { __fstype
                        }
                    }'
                },
                {'name': 'options', 'value': '{
                        { __mount_opts
                        }
                    }'
                },
                {'name': 'force_unmount', 'value': '{
                        { sap_ha_pacemaker_cluster_resource_filesystem_force_unmount
                        }
                    }'
                }
            ]
        }
    ], 'operations': [
        {'action': 'start', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 60
                }
            ]
        },
        {'action': 'stop', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 120
                }
            ]
        },
        {'action': 'monitor', 'attrs': [
                {'name': 'interval', 'value': 200
                },
                {'name': 'timeout', 'value': 40
                }
            ]
        }
    ]
    }
    'idname' is undefined. 'idname' is undefined. {'id': \\"{%- if '/sapmnt' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sapmnt_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/trans' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_transports_filesystem_resource_name %
    }\\\\n{% elif '/usr/sap/' + sap_ha_pacemaker_cluster_nwas_abap_sid + '/SYS' in __mountpoint -%
    }\\\\n  {% set idname = sap_ha_pacemaker_cluster_nwas_sys_filesystem_resource_name %
    }\\\\n{% endif %
    }\\\\n{
        { idname
        }
    }\\",
    'agent': 'ocf:heartbeat:Filesystem', 'instance_attrs': [
        {'attrs': [
                {'name': 'device', 'value': '{
                        { __nfs_server
                        }
                    }/{
                        { __nfs_path
                        }
                    }'
                },
                {'name': 'directory', 'value': '{
                        { __mountpoint
                        }
                    }'
                },
                {'name': 'fstype', 'value': '{
                        { __fstype
                        }
                    }'
                },
                {'name': 'options', 'value': '{
                        { __mount_opts
                        }
                    }'
                },
                {'name': 'force_unmount', 'value': '{
                        { sap_ha_pacemaker_cluster_resource_filesystem_force_unmount
                        }
                    }'
                }
            ]
        }
    ], 'operations': [
        {'action': 'start', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 60
                }
            ]
        },
        {'action': 'stop', 'attrs': [
                {'name': 'interval', 'value': 0
                },
                {'name': 'timeout', 'value': 120
                }
            ]
        },
        {'action': 'monitor', 'attrs': [
                {'name': 'interval', 'value': 200
                },
                {'name': 'timeout', 'value': 40
                }
            ]
        }
    ]
    }
    : 'idname' is undefined.
    'idname' is undefined\\n\\n
    The error appears to be in '/runner/requirements_collections/ansible_collections/community/sap_install/roles/sap_ha_pacemaker_cluster/tasks/construct_vars_nwas_common.yml':
    line 20, column 3, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:
    \\n\\n\\n- name: \\"SAP HA Prepare Pacemaker - Add NetWeaver common filesystem resources to resource definition\\"\\n  ^ here\\n"
}

PLAY RECAP *********************************************************************
ascs_node                  : ok=109  changed=15   unreachable=0    failed=1    skipped=58   rescued=0    ignored=0   
ers_node                  : ok=114  changed=16   unreachable=0    failed=1    skipped=57   rescued=0    ignored=0   

When I remove the two mount points from the variable sap_ha_pacemaker_cluster_storage_definition:

- mountpoint: "/usr/sap/{{ sap_ha_pacemaker_cluster_nwas_abap_sid }}/ASCS{{ sap_ha_pacemaker_cluster_nwas_abap_ascs_instance_nr }}"
  name: usr_sap_ascs
  nfs_path: /vol4
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"
- mountpoint: "/usr/sap/{{ sap_ha_pacemaker_cluster_nwas_abap_sid }}/ERS{{ sap_ha_pacemaker_cluster_nwas_abap_ers_instance_nr }}"
  name: usr_sap_ers
  nfs_path: /vol5
  nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server}}:/"

It does not give me any errors when the execution has finished but the status on the pacemaker shows errors regarding missing mount points above

# pcs status --full
Cluster name: my-cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ascs_node (2) (version 2.1.2-4.el8_6.7-ada5c3b36e2) - partition with quorum
  * Last updated: Tue Feb 20 16:20:03 2024
  * Last change:  Tue Feb 20 16:20:00 2024 by hacluster via crmd on ascs_node
  * 2 nodes configured
  * 12 resource instances configured

Node List:
  * Online: [ ascs_node (2) ers_node (1) ]

Full List of Resources:
  * Resource Group: ABC_ASCS01_group:
    * Filesystem_NWAS_ABAP_ASCS_ABC_01  (ocf::heartbeat:Filesystem):     Stopped
    * SAPInstance_NWAS_ABAP_ASCS_ABC_01 (ocf::heartbeat:SAPInstance):    Stopped
    * vip_ABC_01_ascs   (ocf::heartbeat:IPaddr2):        Stopped
  * Resource Group: ABC_ERS02_group:
    * Filesystem_NWAS_ABAP_ERS_ABC_02   (ocf::heartbeat:Filesystem):     Stopped
    * SAPInstance_NWAS_ABAP_ERS_ABC_02  (ocf::heartbeat:SAPInstance):    Stopped
    * vip_ABC_02_ers    (ocf::heartbeat:IPaddr2):        Stopped
  * Clone Set: Filesystem_NWAS_SYS_ABC-clone [Filesystem_NWAS_SYS_ABC]:
    * Filesystem_NWAS_SYS_ABC   (ocf::heartbeat:Filesystem):     Started ascs_node
    * Filesystem_NWAS_SYS_ABC   (ocf::heartbeat:Filesystem):     Started ers_node
  * Clone Set: Filesystem_NWAS_TRANS_ABC-clone [Filesystem_NWAS_TRANS_ABC]:
    * Filesystem_NWAS_TRANS_ABC (ocf::heartbeat:Filesystem):     Started ascs_node
    * Filesystem_NWAS_TRANS_ABC (ocf::heartbeat:Filesystem):     Started ers_node
  * Clone Set: Filesystem_NWAS_SAPMNT_ABC-clone [Filesystem_NWAS_SAPMNT_ABC]:
    * Filesystem_NWAS_SAPMNT_ABC        (ocf::heartbeat:Filesystem):     Started ascs_node
    * Filesystem_NWAS_SAPMNT_ABC        (ocf::heartbeat:Filesystem):     Started ers_node

Migration Summary:
  * Node: ascs_node (2):
    * Filesystem_NWAS_ABAP_ASCS_ABC_01: migration-threshold=3 fail-count=1000000 last-failure='Tue Feb 20 16:20:01 2024'
    * Filesystem_NWAS_ABAP_ERS_ABC_02: migration-threshold=3 fail-count=1000000 last-failure='Tue Feb 20 16:20:01 2024'
  * Node: ers_node (1):
    * Filesystem_NWAS_ABAP_ASCS_ABC_01: migration-threshold=3 fail-count=1000000 last-failure='Tue Feb 20 16:20:01 2024'
    * Filesystem_NWAS_ABAP_ERS_ABC_02: migration-threshold=3 fail-count=1000000 last-failure='Tue Feb 20 16:20:02 2024'

Failed Resource Actions:
  * Filesystem_NWAS_ABAP_ASCS_ABC_01_start_0 on ascs_node 'not installed' (5): call=63, status='complete', exitreason='Couldn't find device [//ABC/ASCS01]. Expected /dev/??? to exist', last-rc-change='Tue Feb 20 16:20:00 2024', queued=0ms, exec=226ms
  * Filesystem_NWAS_ABAP_ERS_ABC_02_start_0 on ascs_node 'not installed' (5): call=65, status='complete', exitreason='Couldn't find device [//ABC/ERS02]. Expected /dev/??? to exist', last-rc-change='Tue Feb 20 16:20:01 2024', queued=0ms, exec=226ms
  * Filesystem_NWAS_ABAP_ASCS_ABC_01_start_0 on ers_node 'not installed' (5): call=63, status='complete', exitreason='Couldn't find device [//ABC/ASCS01]. Expected /dev/??? to exist', last-rc-change='Tue Feb 20 16:20:01 2024', queued=0ms, exec=234ms
  * Filesystem_NWAS_ABAP_ERS_ABC_02_start_0 on ers_node 'not installed' (5): call=65, status='complete', exitreason='Couldn't find device [//ABC/ERS02]. Expected /dev/??? to exist', last-rc-change='Tue Feb 20 16:20:02 2024', queued=0ms, exec=211ms

Tickets:

PCSD Status:
  ascs_node: Online
  ers_node: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
marcelmamula commented 8 months ago

Hello @ZouhirYachou , your input specified unsupported mount points. mountpoint: "/usr/sap/SID/ERSXX"

Mentioned task was checking if it is one of: /sapmnt, /usr/sap/trans or /usr/sap/SID/SYS, then it assigned variable idname. Your mount was not recognized so idname was undefined, which you can see in your error message: 'idname' is undefined.

{%- if '/sapmnt' in __mountpoint -%}
{% elif '/usr/sap/trans' in __mountpoint -%}
{% elif '/usr/sap/' + sap_ha_pacemaker_cluster_nwas_abap_sid + '/SYS' in __mountpoint -%}

I have setup ASCS/ERS cluster just moments ago using this filesystem setup:

      storage_definition:
        - name: usr_sap
          mountpoint: /usr/sap
          nfs_path: /usr/sap
          nfs_server: "{{ sap_vm_provision_nfs_mount_point | default('') }}"
          nfs_filesystem_type: "{{ sap_vm_provision_nfs_mount_point_type | default('') }}" 
          nfs_mount_options: "{{ sap_vm_provision_nfs_mount_point_opts | default('') }}"  

@sean-freeman @berndfinger

ja9fuchs commented 8 months ago

Hi @ZouhirYachou , what @marcelmamula wrote is correct. The role is currently limited in its flexibility of ASCS/ERS filesystem setup. This is certainly something we will look to improve in the future.

sean-freeman commented 7 months ago

@ZouhirYachou to be clear on the ask and subsequent error:

- Use separate NFS for /usr/sap/trans [as standard]

- Instead of shared NFS for /usr/sap and all subdirectories
- Use separate NFS for each of /usr/sap/<SID>/SYS, /usr/sap/<SID>/ASCS00, /usr/sap/<SID>/ERS02

- Use separate NFS for /sapmnt

For reference, confirmed working code:


sap_ha_pacemaker_cluster_storage_nfs_filesytem_type: nfs
sap_ha_pacemaker_cluster_storage_nfs_mount_options: 'defaults'
sap_ha_pacemaker_cluster_storage_nfs_server: "fs-00000000000000000.efs.eu-west-2.amazonaws.com:/"

special_sap_ha_pacemaker_cluster_storage_nfs_server_separate_sap_transport_dir: "fs-11111111111111111.efs.eu-west-2.amazonaws.com:/"

sap_ha_pacemaker_cluster_storage_definition:

  # Must have directories available /<SID>/SYS, /<SID>/ASCS<NN>, /<SID>/<ERS<NN>
  - name: usr_sap
    mountpoint: /usr/sap
    nfs_path: /usr/sap
    nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server }}"

    - name: sapmnt
    mountpoint: /sapmnt
    nfs_path: /sapmnt
    nfs_server: "{{ sap_ha_pacemaker_cluster_storage_nfs_server }}"

    - name: usr_sap_trans
    mountpoint: /usr/sap/trans
    nfs_path: /usr/sap/trans
    nfs_server: "{{ special_sap_ha_pacemaker_cluster_storage_nfs_server_separate_sap_transport_dir }}"
ZouhirYachou commented 6 months ago

Hello @sean-freeman Yes that is correct so the expected NFS mount would be the following:

/sapmnt/[SID] /usr/sap/trans /usr/sap/[SID]/SYS /usr/sap/[SID]/ASCS01 /usr/sap/[SID]/ERS02

My understanding is when having the ASCS and ERS mount on both node, this would allow a smoother takeover when one of the nodes fails

sean-freeman commented 6 months ago

@ZouhirYachou

The Ansible Role accepts only 1 NFS for /usr/sap path.

The Ansible Role expects the following setup (which can be achieved via sap_storage_setup Ansible Role if desired), note the separate NFS for the transports directory:

[root@nw-ascs ~]# df -h
Filesystem                                                            Size Used Avail Use% Mounted on

fs-00000000000000000.efs.eu-west-2.amazonaws.com:/usr/sap/trans       8.0E   0  8.0E   0% /usr/sap/trans

fs-11111111111111111.efs.eu-west-2.amazonaws.com:/sapmnt              8.0E   0  8.0E   0% /sapmnt
fs-11111111111111111.efs.eu-west-2.amazonaws.com:/usr/sap/S01/SYS     8.0E   0  8.0E   0% /usr/sap/S01/SYS
fs-11111111111111111.efs.eu-west-2.amazonaws.com:/usr/sap/S01/ASCS00  8.0E   0  8.0E   0% /usr/sap/S01/ASCS00
fs-11111111111111111.efs.eu-west-2.amazonaws.com:/usr/sap/S01/ERS10   8.0E   0  8.0E   0% /usr/sap/S01/ERS10

Will look tonight for some reference documents; admittedly the sap_ha_pacemaker_cluster Ansible Role's documentation for this variable could be improved to explain the expectations and the Ansible Role logic could be improved to specifically check the NFS target server for these directories and provide a pretty error.