The upgrade strategy uses helm upgrade with --timeout=5h0s parameters.
Issues
If the upgrade of any one node fails, the upgrade will fail after 5 hours
Solution
Trigger the upgrade without --wait and --timeout , but monitor the daemonset status for up-to-date pods.
Configure dynamic timeout strategy which shall fail the upgrade if a single node takes more 10 mins (Benchmarked on a 6 node ROKS/Vanilla IKS) to upgrade.
Changes
What this PR does / why we need it:
The upgrade strategy uses helm upgrade with
--timeout=5h0s
parameters.Issues
If the upgrade of any one node fails, the upgrade will fail after 5 hours
Solution
--wait
and--timeout
, but monitor the daemonset status for up-to-date pods.