longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
6k stars 592 forks source link

[BUG] Helm upgrade from 1.5.1 to 1.5.5 has only manager instance 1.5.1 running #8839

Open eqinox76 opened 3 months ago

eqinox76 commented 3 months ago

Describe the bug

It's likely that this is a usage bug but after a successful upgrade via helm the manager instances are still on the old version

To Reproduce

Please wait a few minutes for other Longhorn components such as CSI deployments, Engine Images, and Instance Managers to be initialized.

Visit our documentation at https://longhorn.io/docs/

* check the manager version

$ kubectl get -n longhorn pods -o jsonpath="{.items[].spec['initContainers', 'containers'][].image}" | tr -s '[[:space:]]' '\n' | sort | uniq longhornio/csi-attacher:v4.2.0 longhornio/csi-node-driver-registrar:v2.7.0 longhornio/csi-provisioner:v3.4.1 longhornio/csi-resizer:v1.7.0 longhornio/csi-snapshotter:v6.2.1 longhornio/livenessprobe:v2.9.0 longhornio/longhorn-engine:v1.5.1 longhornio/longhorn-instance-manager:v1.5.1 longhornio/longhorn-manager:v1.5.1 longhornio/longhorn-ui:v1.5.1


## Expected behavior

The longhorn-manager version should be now 1.5.5

## Support bundle for troubleshooting

<!--PLEASE provide a support bundle when the issue happens. You can generate a support bundle using the link at the footer of the Longhorn UI. Check [here](https://longhorn.io/docs/latest/troubleshoot/support-bundle/). Then, attach to the issue or send to longhorn-support-bundle@suse.com -->

## Environment

<!-- Suggest checking the doc of the best practices of using Longhorn. [here](https://longhorn.io/docs/latest/best-practices)-->

 - Longhorn version: 1.5.1
 - Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
 - Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE2
 - Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal

## Additional context

I suspect my usage of `--reuse-values` is the culprit. Should i `helm rollback` and `helm upgrade` again or whats the best way to get out of this state?

Helm shows the installation as successful:

helm list -n longhorn NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION longhorn longhorn 2 2024-06-27 09:00:48.489333795 +0200 CEST deployed longhorn-1.5.5 v1.5.5


Interessting log messages from the longhorn-manager:

level=info msg="Checking if the upgrade path from v1.5.1 to v1.5.1 is supported" level=error msg="Failed to sync Longhorn setting longhorn/v2-data-engine" controller=longhorn-setting error="failed to sync setting for longhorn/v2-data-engine: cannot apply v2-data-engine setting to Longhorn workloads when there are attached volumes" node=node-3



Many thanks in advance for your time and help!
DamiaSan commented 3 months ago

Could you provide a support bundle?

DamiaSan commented 3 months ago

I see that the problem is soved in this discussion https://github.com/longhorn/longhorn/discussions/8834. Is this right?

eqinox76 commented 3 months ago

Sure i will try to get a supportbundle. The issue in #8834 was that the upgrade was not starting at all. Now the helm update has run and its telling me that it updated correctly but the manager instances are still on the old version.