Closed melledouwsma closed 6 months ago
@melledouwsma this is a known issue that we can't solve due to limitations of OLM. For more info see: https://github.com/grafana/grafana-operator/issues/1399
To workaround this ether
If you have the operator from community-operators catalog, you can try restarting (deleting) pods from openshift-marketplace/community-operatros deployment. That usually does the trick for me.
else
delete the old operator deployment
Thank you. I am aware of #1399 and I created this new issue because this is a related but different issue. In #1399 the operator is upgraded from v5.6.0 to v5.6.1 and the upgrade fails because of the added labels on the operator deployment. In that case, removing the operator deployment is indeed a perfect workaround.
This is issue is for installations that still have v5.6.0 and have not upgraded to v5.6.1. Those installations do not have a upgrade path to v5.6.3 anymore. Restarting the community-operators pod from the marketplace or removing the old operator deployment makes no difference. There is no new version in the operator catalog anymore that replaces v5.6.0 so the OLM will continue to report that 5.6.0 is the latest version. The situation is specific for clusters that currently have v5.6.0 installed, there is a valid upgrade path from v5.6.1 onwards. This issue is also described in this comment by @ginokok1996.
This issue can be solved by drafting a new release that replaces v5.6.0 in the community-operators catalog, or, as a workaround, remove the CSV and Subscription and then recreate the Subscription with any version above v5.6.0.
Indeed encountered same issue,
restarting the the community operator pods or the OLM pods doesn't work. The community operator pods are restarted in certain intervals anyway to be up to date.
There just doesn't seem to be an upgrade path from v.5.6.0 to any new versions. You would now have to remove the operator and install at least version v.5.6.3 for it to function normally again.
As explained in the other issue, there is nothing we can do from the maintainers point of view. We are not allowed to do any updates to existing versions of the community or redhat provider. We followed the RedHat manintaienrs suggested workaround by adding a skip flag to the never versions OLM but it seems like it doesn't work.
Furthermore, we have created an issue upstream around this issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/3176, I'm not a redhat employee, and I'm not an OCP customer, so I have no possible way of asking RedHat to prioritize this issue. But I would love you reach out to your sales representative and point to this issue and ask the OLM maintainers to come with a solution.
So instead of doing an uninstal,l it sounds like removing the CSV and subscription sounds like the best solution forward.
Hi @NissesSenap, thanks for explaining. The fix suggested by the maintainers of community-operators did sort-of worked, it marked v5.6.1 as a release that should be skipped. The broken upgrade path is caused by the replaces: grafana-operator.v5.6.2
in the same CSV. When skipping a release, that replaces:
is usually filled with the version before the skipped one. For example, when releasing 5.6.2, you'd mark 5.6.1 as skipped and 5.6.2 as the direct replacement of 5.6.0.
It has been a while since I worked with OLM in this much detail, but it should be possible to submit a new release that sorts this out and restores the upgrade path for v5.6.0. I'll look into that and create a Pull Request, but I'd like to run some local tests first to make sure it doesn't create new issues.
Hi @melledouwsma , if there is a good way to sort this ought that would be great, and we would be eternally grateful. To have something to talk about, it's probably easiest if you create a PR in https://github.com/k8s-operatorhub/community-operators/tree/main/operators/grafana-operator, you can just tag me in the PR + link it in this issue and I will look.
Just remember that you can't update any existing releases, life would be much easier if it was possible but apparently not an option....
I will reopen this issue so we can discuss this easier.
Hey @melledouwsma , have you had any time to look in to this?
Hey @NissesSenap, I ran some tests last week, by creating a local CatalogSource and then experimenting with the different options to instruct OLM on a new release. My first attempt fixed 5.6.0 and unfortunately broke the upgrade path for the more recent versions. I have some more ideas and some time later this week, expect a new update in a couple of days.
As mentioned before, the metadata becomes immutable once released and we cannot change it. The operator uses "replaces" mode, where the upgrade graph is created by explicitly specifying one older release in a replaces:
attribute on the new release.
The upgrade path is only broken for clusters that are still on v5.6.0. This can be restored by creating a new release with the following attributes in the CSV:
replaces: grafana-operator.v5.6.0
skips:
- grafana-operator.v5.6.1
- grafana-operator.v5.6.2
- grafana-operator.v5.6.3
- grafana-operator.v5.7.0
version: 5.7.1
This marks the new release as a upgrade for v5.6.0 while still allowing all other versions to upgrade. I did some tests by locally building a CatalogSource and trying the different versions on a OpenShift cluster. The cluster will report "Upgrade available" with all installed versions, including on 5.6.0.
However, because v5.6.1-v.5.7.0 are in the skips:
block the only available upgrade is direct to v5.7.1. For example, the upgrade from v5.6.2 to v5.6.3 is no longer offered by the OLM, the cluster will only offer a upgrade to v.5.7.1. It only affects the upgrade path, new installations with a Subscription that contains, for example, startingCSV: grafana-operator.v.5.6.3
are still possible.
This is something to think about, I suppose. If you'd like to continue this route, I'm happy to produce a PR containing the changes to the CSV for a future new release.
Hi @melledouwsma , thanks allot for your work in this! From the operator point of view, it's not an issue going directly to the latest release from 5.6.0.
Could you create a PR with this change in our repo? And I can cut a new release in OLM. I'm at Kubecon next week, so I won't be able to do it then. But I can also ask one of the other maintainers to fix it
This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label
Describe the bug There's no upgrade path available for OpenShift clusters currently running v5.6.0 of the operator. The change added in #1405 to skip version v5.6.1 is now causing an issue with the OLM update graph. As v5.6.1 is the only version that
replaces: v5.6.0
and that version should now be skipped, the OLM has no upgrade path and shows the operator asAtLatestKnown
instead ofUpgradePending
.Version v5.6.0
To Reproduce Steps to reproduce the behavior:
grafana-upgrade-example
OperatorGroup
:Subscription
for v5.6.0 (with manual installPlanApproval to show the behavior):4 Approve the
InstallPlan
the OLM created and wait for the operator to be installedopenshift-marketplace
makes no differenceExpected behavior I would expect the OLM to present v5.6.3 as an upgrade for this installed operator. When you repeat this example with
startingCSV: grafana-operator.v5.6.1
you'll see exactly that behavior.Suspect component/Location where the bug might be occurring This is probably caused by https://github.com/grafana/grafana-operator/blob/04b8181d5ea2ccb4299fa934f1150cc127e0a5f5/bundle/manifests/grafana-operator.clusterserviceversion.yaml#L434C1-L436C30 where v5.6.1 is set to be skipped and there is no alternative upgrade path from v5.6.0. This document has more info on skipping updates.
Screenshots
Runtime (please complete the following information):