networktocode / pyntc

Python library focused on tasks related to device level and OS management.
https://pyntc.readthedocs.io/en/latest/
Other
170 stars 52 forks source link

Upgrade in install mode does not wait for reboot #266

Closed balmasea closed 1 year ago

balmasea commented 1 year ago

Environment

Expected Behavior

After running an upgrade in install mode, I would expect that library would wait for the device to be reloaded.

Cisco IOS XE Software, Version 17.06.03
Cisco IOS Software [Bengaluru], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 17.6.3, RELEASE SOFTWARE (fc4)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2022 by Cisco Systems, Inc.
Compiled Wed 30-Mar-22 23:09 by mcpre

Cisco IOS-XE software, Copyright (c) 2005-2022 by cisco Systems, Inc.
All rights reserved.  Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0.  The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY.  You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0.  For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.

ROM: IOS-XE ROMMON
BOOTLDR: System Bootstrap, Version 16.12.2r, RELEASE SOFTWARE (P)

hostname uptime is 2 days, 14 hours, 55 minutes
Uptime for this control processor is 2 days, 14 hours, 56 minutes
System returned to ROM by Reload Command at 21:00:42 UTC Tue Nov 22 2022
System restarted at 21:03:09 UTC Tue Nov 22 2022
System image file is "flash:/packages.conf"
Last reload reason: Reload Command

Observed Behavior

It immediately runs the method _wait_for_device_reboot, capturing the show version output and exception OSInstallError is raised. Show version output fragment:

Cisco IOS Software [Gibraltar], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 16.12.4, RELEASE SOFTWARE (fc5)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2020 by Cisco Systems, Inc.
Compiled Thu 09-Jul-20 21:49 by mcpre

Cisco IOS-XE software, Copyright (c) 2005-2020 by cisco Systems, Inc.
All rights reserved.  Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0.  The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY.  You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0.  For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.

ROM: IOS-XE ROMMON
BOOTLDR: System Bootstrap, Version 16.12.2r, RELEASE SOFTWARE (P)

hostname uptime is 8 weeks, 6 days, 8 hours, 1 minute
Uptime for this control processor is 8 weeks, 6 days, 8 hours, 3 minutes
System returned to ROM by PowerOn
System restarted at 12:34:11 UTC Wed Sep 21 2022
System image file is "flash:packages.conf"
Last reload reason: PowerOn

Steps to Reproduce

We could not find a deterministic way to reproduce the error. Pretty much, we have run an upgrade in install mode and capture the output from _wait_for_reboot method.

itdependsnetworks commented 1 year ago

Can you provide the stacktrace? I would guess it correctly calls wait for device reboot, which is the loop that gives it an hour to reboot, but the reality is it hasn’t rebooted yet, and since that, it fails since the version isn’t correct.

balmasea commented 1 year ago

Stacktrace does not contain much, just the exception being raised.

fatal: [hostname]: FAILED! => {"changed": false, "msg": "hostname was unable to boot into cat9k_iosxe.17.06.03.SPA.bin"}

That's exactly what we have seen, reboot after running the install mode command in https://github.com/networktocode/pyntc/blob/develop/pyntc/devices/ios_device.py#L709 does not reboot immediately for some devices.

balmasea commented 1 year ago

Hi there. Any update about this matter? Is there anything I can help with?

jeffkala commented 1 year ago

So the summarization is:

  1. Install mode is used on IOS
  2. Runs install command via show method with a custom delay factor.
  3. The install command should do the reboot without the code ever needing to run reboot (see L709).

So what we're seeing is the delay factor is not long enough here for the install command to complete. It then moves to L721, and waits for reboot, however the install process is still occurring so it actually assumes the reboot has already completed.

We need the wait for reboot to be smarter then just running show version.

balmasea commented 1 year ago

Yep, what you stated in your comment is accurate. I will rework a little bit the merge request soon. Thanks for the answer.

jeffkala commented 1 year ago

closed in #268