zalando / spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Apache License 2.0
1.53k stars 382 forks source link

[BUG] inconsistence in pg_cron dependencies when rebuild old release #880

Open DYukun opened 1 year ago

DYukun commented 1 year ago

Hi spilo experts,

Context

Our project depends on spilo and has requirements on CVE vulnerability fixes. Most CVE vulnerabilities are caused because the base image has to be updated. Our original solution to deal with this is to rebuild the release branch, which will pull the newest dependencies and base image to fix this.

Problem

Recently starting from 2023 March, we found out that there is inconsistency between versions when rebuild spilo. We found that pg_cron extension is not installed correctly and will cause error to cron schema. Error message is like:

ERROR:  extension "pg_cron" must be installed in schema "pg_catalog"                                                   
ERROR:  schema "cron" does not exist           

Solution tried

We doubt that the pg_cron dependency change cause this issue and found https://github.com/zalando/spilo/pull/863 to fix this. However, when patch this PR to our release branch(we use 2.1-p6 release), there will be new error like:

PermissionError: [Errno 13] Permission denied: '/etc/runit/runsvdir/default/patroni'
env: ‘/scripts/patroni_wait.sh’: Permission denied

And our pod will stuck in not ready state.

Question

  1. It seems currently no release contains PR https://github.com/zalando/spilo/pull/863, so I assume we will meet the same issue if rebuild any existing release. Is there any plan for fixing this?
  2. For CVE fixing, do you have any suggestion on how to fix that in previous release version?
    1. Is it possible to make the docker build static on the dependencies' version, especially dependencies other than base image? In this way, when we do rebuild, we can manual upgrade the dependencies with CVE vulnerabilities only(mostly the base image). This also help our CI process to provide a reproducible build.
    2. Is it possible to move to distroless base images? In our practice experience, distroless base image has less possibility to introduce CVE vulnerabilities.

Looking forward to your reply and suggestions. Thanks!

hughcapet commented 1 year ago
  1. You don't necessarily need this fix to be released if you already build Spilo from source. Releases currently mostly exist to periodically build ghcr images. So you can simply build your image from the current master branch state, as, for example, we already run it internally for quite some time.
    1. It is not that easy to pin all the versions (even though for me it is a very attractive idea). Each package has a lot of dependencies and that leads us to pinning and periodically updating hundreds of packages. But I am open to hear your suggestion on doing it in the least painful way.
    2. I don't see any immediate benefit over the COMPRESS=true Spilo build.

Btw, what CVE are you speaking about?

DYukun commented 1 year ago

Actually we are building from your release branch (2.1-p6 for now) instead of master, to make sure we can still use a stable version of spilo before we decide to put some effort on fixing potential gaps in upgrade.

For this particular case, we are fixing the HIGH risk CVE-2023-0286 and a bunch of other CVE with lower risk. But new risk will keep coming continuously as the new base image fixing them. So we're looking for a solution that:

hughcapet commented 1 year ago

if you rebuild any release branch instead of pulling the prebuilt image, it is not stable anyway (versions not pinned, it will install the latest). That is why I suggested switching to master branch. Moreover, pay attention to the fact 2.1-p6 is quite old, there are later releases.

I really don't get what you want here, sorry. Let's start from the very beginning.

pg_cron extension is not installed correctly and will cause error to cron schema.

this error is caused by a subtle change introduced in v1.5.0 pg_cron release.

inconsistency between versions when rebuild spilo

yes, as we have already discussed, version is not pinned and Spilo pulls the latest available version from pgdg repo (but you can change Spilo build scripts to build pg_cron from any commit from source if you need).

However, when patch this PR to our release branch(we use 2.1-p6 release), there will be new error like: PermissionError: [Errno 13] Permission denied: '/etc/runit/runsvdir/default/patroni' env: ‘/scripts/patroni_wait.sh’: Permission denied

this error has nothing to do with patching with this pg_cron installation fix. The problem is somewhere else. Might be fixed in the later releases, might be in the way you run Spilo...

DYukun commented 1 year ago

OK let me re-summary the problem and solutions we investigated. Hopefully this help understand what tradeoff we are facing.

Problem context

We have a project that depend on spilo images. When using spilo images, our container registry has vulnerability scanning to check whether there is known CVE existing in the images used. And we have requirements to fix them.

In addition, for the CVE source, our scanning system shows that mostly it comes from the ubuntu base image.

We are balance between:

Known Solutions

1.Rebuild release branch (our current solution)

Currently redo the docker build for spilo will fetch all latest dependencies including base image. Therefore the base image got updated and the CVEs in the previous versions will be fixed.

We report this issue because there is edge cases that the latestpg_cron release is not compatible, and therefore the rebuilt image on old release version won't work anymore.

2. Rebuild release branch with tagged dependencies

An improve version of Solution 1. When doing rebuild, it's possible to tag dependencies to stable version and only upgrade the dependencies needed to fix the CVE(mostly the base ubuntu image).

3. Use prebuilt image provided in each release

4. Build from master branch

This is also suggested in #840. Is building from master branch always stable and is it recommended?

hughcapet commented 1 year ago

Is building from master branch always stable and is it recommended?

Not always but now it is

DYukun commented 1 year ago

Got it. We will probably keep the current solution to avoid building from non-release branch, but we will try to use the official image as much as possible.

Is there a new release plan recently which can fix the CVEs? Currently p3.0-1 can't be rebuild to fix CVEs because of the incompatible issue mentioned above.