sclorg / postgresql-container

PostgreSQL container images based on Red Hat Software Collections and intended for OpenShift and general usage. Users can choose between Red Hat Enterprise Linux, Fedora, and CentOS based images.
http://softwarecollections.org
Apache License 2.0
162 stars 211 forks source link

Upgrade fails due to postmaster.pid #550

Open eifrach opened 6 months ago

eifrach commented 6 months ago

Container platform

OCP 4, Podman/Docker

Version

12, 13

OS version of the container image

RHEL 7, RHEL 8, CentOS 7, CentOS Stream 8

Bugzilla, Jira

No response

Description

We are trying to work on and upgrade flow from version 12 to version 13 we are using k8s deployment pods when we try to deploy the new image we get an error during the upgrade

==========  $PGDATA upgrade: 12 -> 13  ==========                                                                                                                                                 

===>  Starting old postgresql once again for a clean shutdown...                                                                                                                                  

pg_ctl: another server might be running; trying to start server anyway                                                                                                                            
waiting for server to start....2024-02-14 12:46:35.514 UTC [31] FATAL:  lock file "postmaster.pid" already exists                                                                                 
2024-02-14 12:46:35.514 UTC [31] HINT:  Is another postmaster (PID 1) running in data directory "/var/lib/pgsql/data/userdata"?           

It seems that the postmaster.pid is always there also on restart / redeployment and the service recovers without an issue. Only during the upgrade do we face the problem.

deleting the file and redeploying the upgrade solves it.

my question is,

  1. Is that the normal behavior ?
  2. What is the best approach to do deployment upgrades?

not sure if deleting a PID is good solution to run in Production enviorment

Reproducer

deploy version 12, and redeploy version 13 with the upgrade env

can be done with podman play or minikube

fila43 commented 6 months ago

Hello, thank you for your report. Since every layer (podman, openshift,..) complicates the debugging, it would be really helpful to provide a reproducer just using podman or on rpm level.

eifrach commented 6 months ago

you can use this on minikube

  1. create storage
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
    name: postgres-pv-claim
    labels:
    app: postgres
    spec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
      storage: 10Gi
  2. deploy postgres
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: postgres
    spec:
    selector:
    matchLabels:
      app: postgres
    replicas: 1
    strategy:
    type: Recreate
    template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: quay.io/centos7/postgresql-12-centos7
          imagePullPolicy: "IfNotPresent"
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRESQL_UPGRADE
              value: "hardlink"
            - name: POSTGRESQL_UPGRADE_FORCE
              value: "true"
            - name: POSTGRESQL_DATABASE
              valueFrom:
                secretKeyRef:
                  name: assisted-installer-rds
                  key: db.name
            - name: POSTGRESQL_USER
              valueFrom:
                secretKeyRef:
                  name: assisted-installer-rds
                  key: db.user
            - name: POSTGRESQL_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: assisted-installer-rds
                  key: db.password
          volumeMounts:
            - mountPath: /var/lib/pgsql/data
              name: postgredb
          resources:
            limits:
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 400Mi
      volumes:
        - name: postgredb
          persistentVolumeClaim:
            claimName: postgres-pv-claim

than redeploy step 2 with a newer image quay.io/centos7/postgresql-13-centos7

This are the same steps that I'm running

eifrach commented 6 months ago

BTW, when you restart the image you can see that the postmater.pid is there even without the upgrade it's not been deleted when shutdown the postgres services

fila43 commented 6 months ago

It seems to be caused by the orchestration. I tested the scenario just by deploying the image by podman and then redeploying and the upgrade works fine.

eifrach commented 6 months ago
  1. make sure you are using persistent data, could be you are not migrating the DB.
  2. make soure you have the postmaster.pid created before starting the upgrade.

but I did it with podman and minikube and openshift.

fila43 commented 6 months ago

Of course, I am using persistent data location. Without the upgrade option, it isn't possible to re-deploy it with the higher Postgresql version. From my POW minikube and openshift block, somehow removal of postmaster.pid file. Just off the top of my head, I see a few possibilities why the file persists:

  1. The container is stopped before postgresql removes the file
  2. postgresql is not allowed to remove the file from the openshift storage
eifrach commented 6 months ago

so heres a step by step running it locally using this deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  selector:
    matchLabels:
      app: postgres
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: quay.io/centos7/postgresql-13-centos7
          imagePullPolicy: "IfNotPresent"
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRESQL_UPGRADE
              value: "hardlink"
            - name: POSTGRESQL_UPGRADE_FORCE
              value: "true"
            - name: POSTGRESQL_DATABASE
              value: testdb
            - name: POSTGRESQL_USER
              value: user1
            - name: POSTGRESQL_PASSWORD
              value: password 
          volumeMounts:
            - mountPath: /var/lib/pgsql/data
              name: postgresdb
          resources:
            limits:
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 400Mi
      volumes:
      - name: postgresdb
        persistentVolumeClaim:
          claimName: postgres-pv-claim
> # create volume
> podman volume create postgres-pv-claim 
postgres-pv-claim

> # starting pod
> podman kube play deployment.yaml 
Pod:
2e7286f9445a896a07a9f2f6ee0f8dc0a5438d2f63d98fa61c2fa98054076cb8
Container:
53021a21703d7b141c4c511dc71da6d97db738e3401c6bc8c402b758d466d4e3

>  # change the image
> sed -i 's/postgresql-12/postgresql-13/g' deployment.yaml 
> grep -Rn image: deployment.yaml 
19:          image: quay.io/centos7/postgresql-13-centos7

> # replace pod
> podman kube play deployment.yaml --replace 
Pods stopped:
7b45233e856809db7527ec4cb074bfaedec0f3d9c16a4954500980f34ddb0b67
Pods removed:
7b45233e856809db7527ec4cb074bfaedec0f3d9c16a4954500980f34ddb0b67
Secrets removed:
Volumes removed:
Trying to pull quay.io/centos7/postgresql-13-centos7:latest...
Getting image source signatures
Copying blob b929e0b929d2 done   | 
Copying blob 06c7e4737942 skipped: already exists  
Copying blob c61d16cfe03e skipped: already exists  
Copying config 7d69e95a7f done   | 
Writing manifest to image destination
Pod:
f96d847bf394168c141c8c5a5d9e63cb4c373eb7b0defaa1f6468431a61ccaf5
Container:
52e3699579e0398985e4d8d815059c42b0a8ed285471398c0bcb4f24ff4f41ed

> # logs...
> podman logs postgres-pod-postgres 

==========  $PGDATA upgrade: 12 -> 13  ==========

===>  Starting old postgresql once again for a clean shutdown...

pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2024-03-13 08:58:58.037 UTC [26] FATAL:  lock file "postmaster.pid" already exists
2024-03-13 08:58:58.037 UTC [26] HINT:  Is another postmaster (PID 1) running in data directory "/var/lib/pgsql/data/userdata"? 
 stopped waiting
pg_ctl: could not start server
Examine the log output.

p.s. for some reason the --replace command doesn't work when running in user mode, please run it as root if you wanna recreate

eifrach commented 6 months ago

FYI, when you run pg_ctl -D /var/lib/pgsql/data/userdata stop it does remove the PID file.

I'm guessing that the SIGTERM is not shutting down the container properly.

fila43 commented 6 months ago

FYI, when you run pg_ctl -D /var/lib/pgsql/data/userdata stop it does remove the PID file.

I'm guessing that the SIGTERM is not shutting down the container properly.

I have the same opinion, it would make sense, that Postgresql stop is not handled correctly in the openshift environment

eifrach commented 6 months ago

I see it in minikube / podman and openshift.

maybe a better way is to add PreStop command but, this may casue some logs/events

      preStop:
        exec:
          command: [  pg_ctl, -D, /var/lib/pgsql/data/userdata, stop ]
phracek commented 6 months ago

@eifrach Can you please test your deployment with the defined namespace, that is unique to your switched project.

eifrach commented 6 months ago

what do you mean? the orignal is in namespace this deployment just for tests this is the original project which I've test on minikube with namespace

https://github.com/openshift/assisted-service/blob/bc5a2af1aced6ce9c716e80d24482e6af8fa3663/deploy/postgres/postgres-deployment.yaml

zmiklank commented 5 months ago

I was reviewing the PR which should fix this issue: https://github.com/sclorg/postgresql-container/pull/554.

However the main communication seems to be here, so I'll ask here.

As you have discussed here, it indeed seems, like the container/pod is not terminating gracefully with SIGTERM. OCP/podman should send SIGKILL when SIGTERM does not manage to terminate the running container.

If this is really happening here - and the container is terminated by SIGKILL - then based on upstream postgresql documentation this is not safe, and should be avoided:

"It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores. Furthermore, SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well."

In that case I think that the proposed fix could have some unwanted consequences. We should find out how the process was terminated in this use case.

The SIGTERM seems to work correctly when running the container directly from podman. (podman run/podman kill)

eifrach commented 5 months ago

thanks so much - i will try it out in a day or two and let you know

eifrach commented 5 months ago

hey, I tested the fix and seems to be working as design.

but I found a new problem, the repo has no longer have centos7 builds.

and centos8 with psql-13 doesn't have psql-12 to run the upgrade - is it a bug? ( i had to build with the old centos7 to validate the fix )

fila43 commented 5 months ago

I was reviewing the PR which should fix this issue: #554.

However the main communication seems to be here, so I'll ask here.

As you have discussed here, it indeed seems, like the container/pod is not terminating gracefully with SIGTERM. OCP/podman should send SIGKILL when SIGTERM does not manage to terminate the running container.

If this is really happening here - and the container is terminated by SIGKILL - then based on upstream postgresql documentation this is not safe, and should be avoided:

"It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores. Furthermore, SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well."

In that case I think that the proposed fix could have some unwanted consequences. We should find out how the process was terminated in this use case.

I understand your point that sending SIGKILL to the server is not a good practice. We should definitely identify why it happens and try to avoid it. But anyway we also need a mechanism to "recover" after such an ugly shutdown. Proposed PR doesn't handle the cause but the consequence.

The SIGTERM seems to work correctly when running the container directly from podman. (podman run/podman kill)

So I would propose to merge it, if it works and doesn't case any other issue. And we can open an issue for further investigation to find out why the .pid file persists.

zmiklank commented 5 months ago

hey, I tested the fix and seems to be working as design.

but I found a new problem, the repo has no longer have centos7 builds.

CentOS 7 images from SCLOrg are no longer being rebuilt, see more in commit 9ebea2b8b44d71d14add8f94ed072f1b13dc4809.

and centos8 with psql-13 doesn't have psql-12 to run the upgrade - is it a bug? ( i had to build with the old centos7 to validate the fix )

* fedora psql13 same can't upgrade from 12

I do not know, and even if this is a bug, it would not be in a container image layer, but rather rpm layer. @fila43 Do you have any info about this?

So I would propose to merge it, if it works and doesn't case any other issue. And we can open an issue for further investigation to find out why the .pid file persists.

@fila43 , Ok, I agree.

eifrach commented 5 months ago

Now I tried to use the build to upgrade from 12 to 13 using the fedora image - seems that PSQL12 not found quay.io/fedora/postgresql-13:latest

==========  $PGDATA upgrade: 12 -> 13  ==========

/usr/share/container-scripts/postgresql/common.sh: line 335: scl_source: No such file or directory

===>  Starting old postgresql once again for a clean shutdown...

/usr/share/container-scripts/postgresql/common.sh: line 353: /opt/rh/rh-postgresql12/root/usr/bin/pg_ctl: No such file or directory
Warning: Can't detect cpuset size from cgroups, will use nproc
eifrach commented 5 months ago

just fyi, after playing a bit with the script I managed to upgrade to 13 -- not sure this will fit all builds but this is my workaround

https://github.com/sclorg/postgresql-container/commit/9c25f36b62cbdddbedff20e51902709a2204190b

zmiklank commented 5 months ago

@eifrach Thanks for the analysis. We discussed this within the team and came to conclusion that some bigger changes of how upgrades in sclorg postgresql container images work would need to be incorporated. This could, however, take some time.

eifrach commented 5 months ago

sry for the late response, but for RHEL images it's a big issue for some of openshift operators and PSQL12 is EOL soon. do you have any other solution ?

fila43 commented 5 months ago

At RPM level, we also have the Dump and Restore upgrade approach. It should also work for containers. More about here. Would it be suitable workaround for your use-case?

eifrach commented 5 months ago

The app uses official Redhat images, which also have security updates and those very important to us.

For this I would need either to create a custom build or create some kind of a startup script for the migration. which I don't want to add the upgrade to the application -- if I do, we will need to maintain / test. It adds complexity we are trying to avoid.

eifrach commented 5 months ago

btw, this is how I got the upgrade to work

FROM quay.io/sclorg/postgresql-13-c8s:latest

USER root

ENV PQSL12_BIN=/usr/lib64/pgsql/postgresql-12
ENV PQSL12_DESTANATION=/opt/rh/rh-postgresql12/root/usr

RUN dnf install -y procps-ng util-linux postgresql-upgrade 

RUN mkdir -p $PQSL12_DESTANATION && \
    /usr/libexec/fix-permissions $PQSL12_BIN/bin && \
    ln -s $PQSL12_BIN/bin  $PQSL12_DESTANATION/bin 

# removing issues from current sctipt 
RUN sed -i 's#"${old_pgengine}/pg_isready"# #g' /usr/share/container-scripts/postgresql/common.sh && \
    sed -i 's#new_pgengine=/opt/rh/rh-postgresql${new_raw_version}/root/usr/bin#new_pgengine="/bin"#g' /usr/share/container-scripts/postgresql/common.sh && \
    sed -i 's#&& ! pgrep -f "postgres" > /dev/null# #g'  /usr/share/container-scripts/postgresql/common.sh

USER 26

ENTRYPOINT ["container-entrypoint"]
CMD ["run-postgresql"]
fila43 commented 5 months ago

Since we do not support in-place upgrades for images greater than centos7, as you mentioned, the container would need a couple of changes. This would lead to an increase in the size of the container. I recommend filing a feature request in JIra.

carbonin commented 5 months ago

Since we do not support in-place upgrades for images greater than centos7

Out of curiosity, why was this dropped? Seems like a rather large feature to remove.

carbonin commented 5 months ago

Looks like the docs at least were removed in https://github.com/sclorg/postgresql-container/pull/489/commits/e773fe833d57d8df51dfbdfed355d62e185ff2ff

@hhorak do you have more details around why in-place upgrade was removed? Would something like @eifrach's patch be adaquate to add support back?

fila43 commented 5 months ago

Looks like the docs at least were removed in e773fe8

@hhorak do you have more details around why in-place upgrade was removed? Would something like @eifrach's patch be adaquate to add support back?

More precisely it has never been removed. Between rhel7 and rhel8 there was a change in postgresql on the rpm level and the new design hasn't been adopted yet.

It's an area where we will welcome any kind of contribution. the proposed change is heading in the right direction but needs to be adjusted

eifrach commented 5 months ago

@fila43 I've added a initial PR #562 I test it locally and seems to be working -- can you review and comment

thanks !