sap-linuxlab / community.sap_install

Automation for SAP - Collection of Ansible Roles for various SAP software installation
Apache License 2.0
51 stars 55 forks source link

sap_ha_install_hana_hsr: register secondary node timing issue #679

Open waorb opened 6 months ago

waorb commented 6 months ago

I've tested a workflow in Automation Controller that performs a complete HANA deployment, including a system replication configuration.

There seems to be a timing issue in configure_hsr.yml, where the register secondary node step is started too early and fails with an "unable to contact primary site host node1:40006. connection refused ..." error. It looks like the step is running too soon after the sr_enable operation on the primary, and the primary is not quite ready. If you then log in to the secondary and perform the sr_register operation manually, it works fine.

I was able to reproduce the error consistently, and running the workflow 5 times in a row resulted in 2-3 failures.

I've modified configure_hsr.yml and added a pause task between the enable primary and register secondary tasks:

- name: "SAP HSR - Enable HANA System Replication on primary node"
  ansible.builtin.shell: |
  ...

- name: "Pause for 10 seconds to ensure that System Replication is enabled on primary"
  ansible.builtin.pause:
    seconds: 10

- name: "SAP HSR - Register secondary node to HANA System Replication"
  ansible.builtin.shell: |
  ...

With this change, the workflow runs consistently without any problems.

ja9fuchs commented 6 months ago

A sanity check to ensure the Primary is fully sr_enable'd will make sense before attempting the registration of the Secondary.