sap-linuxlab / community.sap_install

Automation for SAP - Collection of Ansible Roles for various SAP software installation
Apache License 2.0
52 stars 55 forks source link

sap_swpm: does not show the ouptut of install script #643

Open ZouhirYachou opened 7 months ago

ZouhirYachou commented 7 months ago

Hello The task that runs the install script ./sapinst should not be ran as async and then monitored with other tasks When the scripts fails, Ansible does not provide the stderr and stdout for this script, rendering the troubleshooting impossible There should be one task to run for the script

This task https://github.com/sap-linuxlab/community.sap_install/blob/main/roles/sap_swpm/tasks/swpm.yml#L64 should be:

- name: SAP SWPM - {{ sap_swpm_swpm_installation_header }} # noqa no-changed-when
  ansible.builtin.shell: |
    {{ __sap_swpm_sapinst_command }}
  register: __sap_swpm_register_sapinst_async_job
  args:
    chdir: "{{ sap_swpm_sapinst_path }}"
  async: 1800 # Maximum allowed time in Seconds (30 minutes) 
  poll: 30 # Polling Interval in Seconds

and remove the following tasks to monitor the script There is no need to retreive RC and output as the shell module already does this

# Monitor sapinst process (i.e. ps aux | grep sapinst) and wait for exit
- name: SAP SWPM - Wait for sapinst process to exit, poll every 60 seconds
  community.general.pids:
    name: sapinst
#  shell: ps -ef | awk '/sapinst/&&!/awk/&&!/ansible/{print}'
  register: pids_sapinst
  until: "pids_sapinst.pids | length == 0"
#  until: "pids_sapinst.stdout | length == 0"
  retries: 1000
  delay: 60

- name: SAP SWPM - Verify if sapinst process finished successfully
  ansible.builtin.async_status:
    jid: "{{ __sap_swpm_register_sapinst_async_job.ansible_job_id }}"
  register: __sap_swpm_register_sapinst
  failed_when: __sap_swpm_register_sapinst.finished != 1 or __sap_swpm_register_sapinst.rc != 0
#   #until: __sap_swpm_register_sapinst.finished
#   #retries: 1000
#   #delay: 60

- name: SAP SWPM - Display the sapinst return code
  ansible.builtin.debug:
    msg: "{{ __sap_swpm_register_sapinst.rc }}"

- name: SAP SWPM - Display the sapinst output
  ansible.builtin.debug:
    msg: "{{ __sap_swpm_register_sapinst.stdout_lines }}"
  when: sap_swpm_display_unattended_output
sean-freeman commented 7 months ago

@ZouhirYachou sapinst can take many hours to run, the SSH Session Tunnel has a tendancy to timeout and therefore the Ansible Task never ends even when the sapinst process has ended. This is why async approach with checking the process has ended was used. This is explained in the commented code.

The default behaviour was altered upon request of other end-users, where the SWPM stdout/stderr upon error would wipe a terminal window if the scrollback buffer settings were too low (easily SWPM can output 10,000 lines to the terminal window).

Upon end-user request the following commit was created that introduced the variable set to not display output by default: https://github.com/sap-linuxlab/community.sap_install/commit/1861c15972abeeded7351ef41a9425823c39631e

If you use sap_swpm_display_unattended_output: true in your variables, you will see the output.

ZouhirYachou commented 7 months ago

Even with the usage of the sap_swpm_display_unattended_output: true variable, my playbook fails before the task that show the output, therefore, no access to the logs

TASK [community.sap_install.sap_swpm : Display the sapinst command line] *******
ok: [vlh1bse26] => {
    "msg": "SAP SWPM install command: 'umask 022 ; ./sapinst SAPINST_INPUT_PARAMETERS_URL=/tmp/ansible.jrzwy2moswpmconfig/inifile.params SAPINST_EXECUTE_PRODUCT_ID=NW_ABAP_ASCS:S4HANA2022.CORE.HDB.ABAP SAPINST_SKIP_DIALOGS=true SAPINST_START_GUISERVER=false  '"
}

TASK [community.sap_install.sap_swpm : SAP SWPM -] *****************************
changed: [vlh1bse26]

TASK [community.sap_install.sap_swpm : SAP SWPM - Wait for sapinst process to exit, poll every 60 seconds] ***
ok: [vlh1bse26]

TASK [community.sap_install.sap_swpm : SAP SWPM - Verify if sapinst process finished successfully] ***
fatal: [vlh1bse26]: FAILED! =>
{
    "ansible_job_id": "j325284186928.5920",
    "changed": false,
    "failed_when_result": true,
    "finished": 0,
    "results_file": "/root/.ansible_async/j325284186928.5920",
    "started": 1,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []
}

When I run the command manually on the host, I do not get any errors and the script gives a 0 return code

my proposition allows for the monitoring with a update on its status every 30 seconds (we can probably change the value for async to allow more than 30 minutes)

TASK [local_sap_swpm : Display the sapinst command line] ***********************
ok: [vlh1bse26] => {
    "msg": "SAP SWPM install command: 'umask 022 ; ./sapinst SAPINST_INPUT_PARAMETERS_URL=/tmp/ansible.n18ypd4cswpmconfig/inifile.params SAPINST_EXECUTE_PRODUCT_ID=NW_ABAP_ASCS:S4HANA2022.CORE.HDB.ABAP SAPINST_SKIP_DIALOGS=true SAPINST_START_GUISERVER=false  '"
}

TASK [local_sap_swpm : SAP SWPM -] *********************************************
ASYNC POLL on vlh1bse26: jid=j847759566754.5984 started=1 finished=0
ASYNC POLL on vlh1bse26: jid=j847759566754.5984 started=1 finished=0
ASYNC POLL on vlh1bse26: jid=j847759566754.5984 started=1 finished=0
ASYNC OK on vlh1bse26: jid=j847759566754.5984
changed: [vlh1bse26]

TASK [local_sap_swpm : SAP SWPM - Find last installation location] *************
ok: [vlh1bse26]
rhmk commented 7 months ago

HI @ZouhirYachou , async:1800 is very optimistic. I have seen an S/4 install running for 3 hours in a cloud test environment with a slow database, so the async: 32400 makes total sense. if we set poll to 30, we might get the same result as if we watch the the process ending. At least we get a less confusing shell output. I do not know exactly the previous implementation. Still, I would suggest encapsulating the current and the suggested method in code blocks, which enables us to switch between the two by a variable. @ZouhirYachou 's suggestion is at least a cleaner implementation, that should become the default if it can be proven to be stable with the current ansible release. What do you think @berndfinger, @sean-freeman?

sean-freeman commented 7 months ago

@ZouhirYachou something is not right in this output.... under ansible_job_id should be the executed cmd and a stdout/stderr entries.

Such as....


TASK [community.sap_install.sap_swpm : Display the sapinst command line] *********
ok: [nwas01] => {
    "msg": "SAP SWPM install command: 'umask 022 ; ./sapinst SAPINST_INPUT_PARAMETERS_URL=/tmp/ansible.zm7n3b1gswpmconfig/inifile.params SAPINST_EXECUTE_PRODUCT_ID=NW_ABAP_OneHost:S4HANA2021.CORE.HDB.ABAP SAPINST_SKIP_DIALOGS=true SAPINST_START_GUISERVER=false  '"
}

TASK [community.sap_install.sap_swpm : SAP SWPM -] ******************************
changed: [nwas01]

TASK [community.sap_install.sap_swpm : SAP SWPM - Wait for sapinst process to exit, poll every 60 seconds] **********
FAILED - RETRYING: [nwas01]: SAP SWPM - Wait for sapinst process to exit, poll every 60 seconds (1000 retries left).
ok: [nwas01]

TASK [community.sap_install.sap_swpm : SAP SWPM - Verify if sapinst process finished successfully] *********
fatal: [nwas01]: FAILED! =>
{
    "ansible_job_id": "j444392358629.64741",
    "changed": true,
    "cmd": "umask 022 ; ./sapinst SAPINST_INPUT_PARAMETERS_URL=/tmp/ansible.zm7n3b1gswpmconfig/inifile.params SAPINST_EXECUTE_PRODUCT_ID=NW_ABAP_OneHost:S4HANA2021.CORE.HDB.ABAP SAPINST_SKIP_DIALOGS=true SAPINST_START_GUISERVER=false  \n",
    "failed_when_result": true,
    "finished": 1,
    "msg": "non-zero return code",
    "rc": 111,
    "results_file": "/root/.ansible_async/j444392358629.64741",
    "start": "2023-06-30 18:41:40.436147",
    "started": 1,
    "stderr_lines": [
        "=>sapparam(1c): No Profile used.",
        "=>sapparam: SAPSYSTEMNAME neither in Profile nor in Commandline",
        "################################################",
        "Abort execution because of ",
        "Step returns osmod.hosts.getHostByName",
        "################################################"
    ],
    "stdout_lines": [
        "Extracting...",
        "Extraction done!",
        "SAPinst build information:"
        ....
        ....
        "Removed directory /root/.sapinst/nwas01.example.com/64833."
    ]
}
sean-freeman commented 7 months ago

@ZouhirYachou let's confirm a few things because I've not seen this behaviour before and the functionality of this Ansible Role has not changed (except for request to hide output, as shown in commit above + that has no impact on the debug you showed) in over 12 months.

  1. sap_swpm_sapinst_path is set to the directory path containing sapinst? e.g. if /software/sap_swpm_unpack/sapinst then variable would be sap_swpm_sapinst_path: /software/sap_swpm_unpack.

  2. Ansible Core and Python version, see example...

$ ansible-playbook --version
ansible-playbook [core 2.16.2]
  python version = 3.11.7 (main, Dec  4 2023, 18:10:11) [Clang 15.0.0 (clang-1500.1.0.2.5)] (/Users/username/.py_venv/py_ansible/bin/python3)
  jinja version = 3.1.2
  libyaml = True
  1. Ansible Collections versions ansible-galaxy collection list

N.B. Poll is set to 60 seconds, so that it is easier for end-user to mentally calculate how long the installation has taken. It Ansible waits 59 seconds too long on a 5 minute install, it's a bit annoying but on a 3 hour install it's unnoticeable.

ZouhirYachou commented 7 months ago

Hello

The variable is set sap_swpm_sapinst_path: /sapinst/swpm/sap_swpm_extracted/

Ansible version and python version: (we are using Ansible Automation platform 2.4 with Ansible EE 2.15)

bash-4.4# ansible --version
ansible [core 2.15.8]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.9/site-packages/ansible
  ansible collection location = /home/runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.9.18 (main, Sep 22 2023, 17:58:34) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] (/usr/bin/python3.9)
  jinja version = 3.1.2
  libyaml = True
bash-4.4# python --version
Python 3.9.18

and the requirements.yml for the collections

collections:
  - name: community.general
    version: 6.5.0
  - name: redhat.rhel_system_roles
    version: 1.22.0
  - name: community.sap_install
    version: 1.4.0

I do not understand why we use 3 tasks and a poll 0 value when we could just use one task with a positive poll value since we do not run other tasks concurrently I can't explain the issue i'm having (empty output) but with my proposition, I do not have any issues running the script

sean-freeman commented 7 months ago

@ZouhirYachou I explained this above. After a certain release of SAP SWPM 2.0 (SP10 I think), the Ansible Task that executed SAP SWPM would continue forever even though the sapinst process had exited successfully. It was almost impossible to diagnose, therefore a separation:

I'll run an SAP SWPM today with false entries that triggers a failure, using the versions provided to replicate your issue

sean-freeman commented 6 months ago

@ZouhirYachou I have attempted:

I cannot replicate your output (and subsequent failure) from my laptop. Therefore I have to conclude there is something about the specific setup, and I must run a test from Ansible Automation Platform with Ansible EE 2.15

Can you please describe the steps you used to upload and execute your Playbook from AAP ? I've never used it before and want to be sure the setup is identical to yours

ZouhirYachou commented 6 months ago

I have synced the sap_install collection to our internal Automation Hub and we then use it in AAP We used RedHat documentation for the setup

sean-freeman commented 6 months ago

@ZouhirYachou which documentation specifically?

Like I said, I have never used AAP before and will need to setup everything identically to yours.

ZouhirYachou commented 6 months ago

This documentation to configure the Hub with AAP https://access.redhat.com/documentation/en-us/red_hat_ansible_automation_platform/2.4/html/getting_started_with_automation_hub/configure-hub-primary#proc-configure-automation-hub-server-gui

and this documentation to sync content from ansible galaxy https://access.redhat.com/documentation/en-us/red_hat_ansible_automation_platform/2.4/html/managing_content_in_automation_hub/managing-cert-valid-content#assembly-creating-tokens-in-automation-hub

rhmk commented 6 months ago

@Sean: It should be easier and possible to pull the EE and run from ansible-navigator. @Zouhir: In AAP it is recommended to create an AAP with the 3 collections derived from your EE and not bind mount the collection into the container (although this is possible and should work)