rdmorganiser / rdmo-docker-compose

RDMO running in different docker images held together by docker compose
Apache License 2.0
3 stars 12 forks source link

Postgres not coming up #21

Closed johlton closed 9 months ago

johlton commented 1 year ago

I'm running into problems using the current version for a 2nd docker instance on the same machine. Another older version of the docker-compose instance on the same machine works fine.

  1. Edited the variables.local
  2. Run make

make then stops with the following message:

Configuring postgresql-common
-----------------------------

createcluster.conf: A new version (/tmp/postgresql-common.TeWQyR) of
configuration file /etc/postgresql-common/createcluster.conf is available, but
the version installed currently has been locally modified.

  1. install the package maintainer's version
  2. keep the local version currently installed
  3. show the differences between the versions
  4. show a side-by-side difference between the versions
  5. start a new shell to examine the situation
What do you want to do about modified configuration file createcluster.conf?

Since its running in non-interactive mode, none of the choices gets submitted. I then followed https://stackoverflow.com/a/72273217/6948765 and added DEBIAN_FRONTEND=noninteractive as env variable. This fixes the config prompt.

The postgres container gets built, but doesn't start properly. I dont know if it's related at all. docker logs shows:

initdb: error: invalid locale settings; check LANG and LC_* environment variables
chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

And this is where I'm a bit lost. The created volume under /vol/ has the owner systemd-coredump but I cant make anything of it.

Any idea? Thanks!

triole commented 1 year ago

Hi, your problem description looks quite strange to me. I have the suspicion that a probable cause might be that you tried to run the two different docker-compose instances from a single cloned repo folder. Postgres and all the other volumes are created on your host machine inside the vol folder. Starting two instances from one repo folder would lead to both accessing the exactly same database files on your host. This can't work and would explain your modified postgres conf error message.

Currently rdmo-docker-compose needs a repo clone for every instance. It was designed this way because we wanted to have a clear distinction between instances even on the host level to be able to make specific adjustments for running different versions or other experiments.

Please intervene if I assessed your workflow wrong. I'd restart examining the error then.

Regards.

johlton commented 1 year ago

Hey, no, it's separated. 2 repo clones in two different directories. Different ports, too. I totally agree with this clear distinction, everything else would indeed be a mess ;)

docker-compose.yml is structurally not different from dc_master.yaml and the container names are distinguished from the 1st instance:

version: "3.8"

services:
    postgres:
        build:
            context: ./docker/postgres
            args:
                UID: 1004
        container_name: rdc-foobar-postgres
        restart: "always"
        volumes:
            - postgres:/var/lib/postgresql/data
        env_file:
            - variables.local
    [...]

volumes:
    postgres:
        name: rdc-foobar-postgres
        driver_opts:
            type: none
            device: /srv/rdmo-foobar/vol/postgres
            o: bind
    [...]        

I researched the strange ownership systemd-coredump once more and found this on stackoverflow, in the comments:

On Ubuntu 18.04 and before the docker group had gid 999 but in Ubuntu 20.04 that gid is now taken by systemd-coredump. The docker group now seems to default to gid 998. This creates incompatibilities when operating across different versions of Ubuntu.

And indeed:

$ id <USER>
uid=1004(<USER>) gid=1004(<USER>) Gruppen=1004(<USER>),27(sudo),110(lxd),998(docker)

$ id systemd-coredump
uid=999(systemd-coredump) gid=999(systemd-coredump) Gruppen=999(systemd-coredump)

As I understand this is an upstream problem with postgres image. So I wonder what we could do in the makefile to adapt?

Thanks for having a look.

triole commented 1 year ago

Okay, thanks for the feedback. It is a bit hard for me to test because I do not have the issue that you are troubled with but I'll try to look deeper into it. First idea would be to pass the host user's group id into the docker container. I already did this in the https://github.com/rdmorganiser/rdmo-docker-compose/tree/fixes branch. But there seems something else messed up regarding the volumes. I'll let you know when the branch is ready to test.

triole commented 1 year ago

Alright I also added a project prefix to make sure multiple compose setups do not interfere with each other. You can try to let it run. Please let me know if it changes anything.

Have a nice weekend.

https://github.com/rdmorganiser/rdmo-docker-compose/tree/fixes

benji4398 commented 1 year ago

Hi! I had the same issue, also with the fixes branch version. To me it seems that the apt upgrade -y in the postgres Dockerfile causes the issue by trying to install a newer version of createcluster.conf which requires user interaction.

I removed the apt upgrade -y and the postgres container builds, however there was another issue when the Docker file tries to change the /var/lib/postgresql/data permissions.

initdb: error: could not change permissions of directory "/var/lib/postgresql/data": Operation not permitted Issue #22

After removing the ARG UID, RUN chown -R ${UID} /var/lib/postgresql and USER ${UID} statements in the posgres Dockerfile and commenting out the args: UID: <LOCAL_UID> statement in the dc_master.yaml the postgres container starts, however the rdmo container exits with /drun.sh: line 22: gunicorn: command not found. Issue #23

I ll open two separate issues for this to better keep track of them and link them here.

johlton commented 1 year ago

I just reinstalled everything via make from the fixes branch. That worked even though the postgres permission problem reappeared. I then changed ownership of /vol/postgres/ to my current host user and reran make. Works for now!