Open gcccheng opened 3 months ago
I think the original idea was to just set it to an extremely long time where if there is any chance that the system is functioning and might come back it would have the time to do so. we probably should variablize those timeouts to allow the users to set it to what they want to allow.
I agree we should variable-ize those. I can add it to the list, but if someone else wants to have a crack at it, chime in and have at it.
I'm starting work on this now. Hopefully no duplicates.
We are upgrading rhel7 to rhel8, and one of machines is hanging on the rebooting phase for 3 hours(normally 40 minutes), see below output from playbook job
.. TASK [infra.leapp.upgrade : Start Leapp OS upgrade] **** ASYNC POLL on lxrp001.example.com: jid=j641575276660.2466 started=1 finished=0 Waiting for job to be finished. Sleeping for 10 minutes... ASYNC POLL on lxrp001.example.com: jid=j641575276660.2466 started=1 finished=0 .. ASYNC OK on lxrp001.example.com: jid=j641575276660.2466 changed: [ lxrp001.example.com]
TASK [infra.leapp.upgrade : Reboot to continue Leapp OS upgrade] ***
.. on the console of the machine, we could see a login interface with Red Hat Enterprise Linux 8.6 (Ootpa) Kernel 4.18.0-372.32.1.2l8_6.x86_64 on an x86_64
It seems like the upgrading it partially finished and got stuck in the middle.
I checked the infra/leapp/roles/upgrade/tasks/leapp-upgrade.ym file, and it has timeout set as 43260, which is 12 hours.
name: Start Leapp OS upgrade ansible.builtin.shell: > export PATH={{ os_path }}; set -o pipefail; leapp upgrade {{ leapp_upgrade_opts }} {{ leapp_enable_repos_args }} 2>&1 | tee -a {{ log_file }} args: executable: /bin/bash async: "{{ async_timeout_maximum | int }}" poll: "{{ async_poll_interval | int }}"
name: Reboot to continue Leapp OS upgrade ansible.builtin.reboot: msg: "Host is starting Leapp OS upgrade now!" reboot_timeout: 43200 post_reboot_delay: "{{ post_reboot_delay }}" timeout: 43260
Is there any reason why the timeout is so long? Is there any case where upgrading did succeed with 12 hours?@jeffmcutter Thanks!