truecharts / charts

Community Helm Chart Repository
https://truecharts.org
GNU Affero General Public License v3.0
1.13k stars 621 forks source link

Upgrading to latest `cloudnative-pg` version fails; rollback also fails. #16110

Closed schnerring closed 9 months ago

schnerring commented 9 months ago

App Name

cloudnative-pg

Operating System

TrueNAS SCALE 22.12.4.2

App Version

latest_3.0.0

Application Events

-

Application Logs

-

Application Configuration

-

Describe the bug

Upgrading from version 1.21.1_2.0.12 to latest_3.0.0 causes the following error:

 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 427, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 465, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1379, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1247, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/upgrade.py", line 115, in upgrade
    await self.upgrade_chart_release(job, release, options)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/upgrade.py", line 298, in upgrade_chart_release
    await self.middleware.call('chart.release.helm_action', release_name, chart_path, config, 'upgrade')
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1368, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1328, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1231, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/helm.py", line 44, in helm_action
    raise CallError(f'Failed to {tn_action} chart release: {stderr.decode()}')
middlewared.service_exception.CallError: [EFAULT] Failed to upgrade chart release: Error: UPGRADE FAILED: cannot patch "cloudnative-pg" with kind Deployment: Deployment.apps "cloudnative-pg" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"cloudnative-pg", "app.kubernetes.io/name":"cloudnative-pg"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

But looking at the apps UI, the app version was still updated:

image

Rolling back to the previous version (2.0.12) also fails with this error (but the app version actually rolls back to the previous version):

Error: [EFAULT] Failed to complete rollback 'cloudnative-pg' chart release to 2.0.12. Chart release's datasets have been rolled back to '2.0.12' version's snapshot. Errors encountered during rollback were: Error: no ConfigMap with the name "cloudnative-pg-config" found 

Upgrading again causes the same error. I have looked at the latest TrueCharts news, Twitter, chart release notes / change logs, some related PRs, and some git diffs to make sense of what might causing this, but I was unable to find any related breaking changes.

I don't know whether or not the cloudnative-pg deployment is in an inconsistent state now. What's the best course of action here to be sure? Remove the app and related CRDs (again 😴)?

To Reproduce

-

Expected Behavior

If the upgrade fails, why was the app still updated?

Screenshots

-

Additional Context

-

I've read and agree with the following

xstar97 commented 9 months ago

From our discord announcements

Cloudnative-pg:

read below

Note: DO NOT remove cloudnative-pg, as it will lead to complete dataloss of all postgresql containers

xstar97 commented 9 months ago

Update scale to the latest version; cobia is stable, change update branches.; closing as its not a bug.

schnerring commented 9 months ago

Thanks for the quick response.

It would be really nice to include instructions regarding breaking changes in the release notes of the chart and not Discord of all places.

PrivatePuffin commented 9 months ago

Also worth to note: Rollback across major version increases can do major damage. Never do it.

PrivatePuffin commented 9 months ago

Thanks for the quick response.

It would be really nice to include instructions regarding breaking changes in the release notes of the chart and not Discord of all places.

Sadly enough we cannot easily edit changelogs. However, if you run into issues, you're mandatory going to need to hop on discord for a support ticket and will see it.

As the steps required take just a minute and can be taken before or after upgrading, we didn't feel it was needed to write a whole article about it. We're working on reworking some of that project structure wise, but currently we're pushed to our limits and we have to make choices on what we publish where.

schnerring commented 9 months ago

Sorry for going offtopic, but here are my 2 cents.

We're working on reworking some of that project structure wise, but currently we're pushed to our limits and we have to make choices on what we publish where.

I'm looking forward to those changes and sincerly hope it doesn't include Discord (or Slack/Gitter/Teams etc.). Instant messengers are neither issue trackers nor CI/CD tools. I really hope that you'll use something indexed by search engines.

As the steps required take just a minute and can be taken before or after upgrading, we didn't feel it was needed to write a whole article about it. Sadly enough we cannot easily edit changelogs. However, if you run into issues, you're mandatory going to need to hop on discord for a support ticket and will see it.

One of the major challenges in open source development is filtering through the notification noise. This is one of the reasons why semantic versioning and tools like Renovate/Dependabot featuring automated release notes aggregation are so popular (and have been for years). And even with these tools, keeping everything up-to-date is often overwhelming.

I administer around 30 TC apps in my homelab. I'm fine with breaking changes and sifting through the app release notes and chart release notes. But monitoring a Discord server with thousands of users, with terrible search on top of that, and making it mandatory to open a Discord support ticket every other chart release? That's a recipe for burnout ...

And even if I wanted to join your Discord, I can't. I was banned for calling out staff members who didn't follow their own server rules. I really appreciate the work you do, but if you decide to keep Discord as part of your critical release pipeline infrastructure, unfortunately, I'll have consider abandoning TrueCharts, because keeping apps up-to-date is really cumbersome.

PrivatePuffin commented 9 months ago

You are not perma banned afaik send me a dm and ill look into it!

You also dont have to file support tickets or search for every major. That would be silly.

But you should check the announcement. Any issues found get put there first, mostly because it takes time to publish complete news articles.

Simply put:

We simply, at this stage, dont have time available to cross-cross-post.

We expect this to improve in 2024, due to releasing new tooling, policies and workflows.

But yes, for support tickets Discord is going to stay for a while. Till we have a better solution and a more clearly defined support scope, which we are also both working on.