open-horizon / anax

Horizon agent control system
https://open-horizon.github.io/docs/anax/docs/
Apache License 2.0
73 stars 98 forks source link

Bug: deploy/agent deleted on k8s auto upgrade on k3s #4132

Open dlarson04 opened 2 months ago

dlarson04 commented 2 months ago

Describe the bug.

Intermittendly, the OH agent pod is not running after an auto upgrade ... but the NMP status changes to success. Seeing this in the cronjob log

2024-08-11 12:21:09 cronjob under namesapce: edgecluster-ns03
2024-08-11 12:21:09 DEBUG: get_status_path() start
2024-08-11 12:21:09 VERBOSE: Checking if /var/horizon/nmp directory exists...
2024-08-11 12:21:09 VERBOSE: Checking if /var/horizon/nmp/{org} directory exists...
2024-08-11 12:21:09 VERBOSE: Searching NMP subdirectories...
2024-08-11 12:21:09 VERBOSE: Getting latest upgrade job status file...
2024-08-11 12:21:09 STATUS_PATH is /var/horizon/nmp/myorg/Mesh-NMP/status.json
2024-08-11 12:21:09 VERBOSE: Found job: /var/horizon/nmp/myorg/Mesh-NMP/status.json
2024-08-11 12:21:09 DEBUG: get_status_path() end
2024-08-11 12:21:10 DEBUG: Pod status: Running
Pending
2024-08-11 12:21:14 DEBUG: Deployment status: deployment
deployment
Running
cat: /var/horizon/nmp/myorg/Mesh-NMP/status.json: No such file or directory
2024-08-11 12:21:15 DEBUG: Cron Job status:
2024-08-11 12:21:15 Checking if agent is running and deployment is successful...
2024-08-11 12:21:15 Agent is not running. Checking if rollback was already attempted...
2024-08-11 12:21:15 DEBUG: Checking if agent upgrade was initiated...
2024-08-11 12:21:15 Starting rollback process...
2024-08-11 12:21:15 VERBOSE: Setting the status to "rollback started"...
jq: error: Could not open file /var/horizon/nmp/myorg/Mesh-NMP/status.json: No such file or directory
/usr/local/bin/auto-upgrade-cronjob.sh: line 531: /var/horizon/nmp/myorg/Mesh-NMP/status.json: No such file or directory

Describe the steps to reproduce the behavior.

No response

Expected behavior.

install edge cluster agent on k3s Trigger an auto upgrade Intermittent failure

Screenshots.

No response

Operating Environment

Linux

Additional Information

No response

dlarson04 commented 2 months ago

k8s-upgrade-fail.tar.gz