Running some automated tests attempting to upgrade from an anax version to latest.
For some reason, the auto-upgrade failed but then the rollback failed... The rollback failed because the agent pod was in imagePullBackoff as the initContainer image was attempting to pull
public.ecr.aws/docker/library/alpine:2.31.0-1495
cronjob logs
2024-02-12 13:12:05 VERBOSE: Dowloading agent deployment to yaml file...
2024-02-12 13:12:05 VERBOSE: Downgrading version from latest to 2.31.0-1495...
2024-02-12 13:12:05 VERBOSE: Deleting current agent deployment...
2024-02-12 13:12:06 VERBOSE: Creating new agent deployment from backup yaml file...
2024-02-12 13:12:07 Waiting up to 75 seconds for the agent deployment to complete...
error: timed out waiting for the condition
2024-02-12 13:13:22 VERBOSE: Setting status to "rollback failed"
jq: error: Could not open file /var/horizon/nmp/ieam-roks-stage-3/nmpAutoUpgrade2-edgecluster-auto-ubuntu-2004-amd64-1-k3s/status.json: No such file or directory
/usr/local/bin/auto-upgrade-cronjob.sh: line 143: /var/horizon/nmp/ieam-roks-stage-3/nmpAutoUpgrade2-edgecluster-auto-ubuntu-2004-amd64-1-k3s/status.json: No such file or directory
cat: /var/horizon/nmp/ieam-roks-stage-3/nmpAutoUpgrade2-edgecluster-auto-ubuntu-2004-amd64-1-k3s/status.json: No such file or directory
jq: error: Could not open file /var/horizon/nmp/ieam-roks-stage-3/nmpAutoUpgrade2-edgecluster-auto-ubuntu-2004-amd64-1-k3s/status.json: No such file or directory
/usr/local/bin/auto-upgrade-cronjob.sh: line 131: /var/horizon/nmp/ieam-roks-stage-3/nmpAutoUpgrade2-edgecluster-auto-ubuntu-2004-amd64-1-k3s/status.json: No such file or directory
2024-02-12 13:13:37 CRONJOB LOGS FOR JOB: auto-upgrade-cronjob-28462393-tgf9x
2024-02-12 13:13:32 cronjob under namesapce: openhorizon-agent
changed the alpine image from public.ecr.aws/docker/library/alpine:latest to public.ecr.aws/docker/library/alpine:2.31.0-1495 which doesn't exist so it failed to restore the agent
Describe the steps to reproduce the behavior.
No response
Expected behavior.
In the event an auto-upgrade fails, the agent should rollback successfully
Describe the bug.
Running some automated tests attempting to upgrade from an anax version to
latest
. For some reason, the auto-upgrade failed but then the rollback failed... The rollback failed because the agent pod was in imagePullBackoff as the initContainer image was attempting to pullcronjob logs
What appears to happen is that this code https://github.com/open-horizon/anax/blob/04ccc1ad399f47a4f1c7ba38a8c990839101af8d/anax-in-k8s/cronjobs/auto-upgrade-cronjob.sh#L276-L280
changed the alpine image from
public.ecr.aws/docker/library/alpine:latest
topublic.ecr.aws/docker/library/alpine:2.31.0-1495
which doesn't exist so it failed to restore the agentDescribe the steps to reproduce the behavior.
No response
Expected behavior.
In the event an auto-upgrade fails, the agent should rollback successfully
Screenshots.
No response
Operating Environment
linux k3s
Additional Information
No response