lando / platformsh

The Official Platform.sh Lando Plugin
https://docs.lando.dev/platformsh
GNU General Public License v3.0
6 stars 4 forks source link

PostgresSQL data persistence #102

Closed pirog closed 2 years ago

pirog commented 4 years ago

Usually Lando will volume mount a databases data directory so that data can persist across a lando rebuild. We are currently doing this with the platform mysql/mariadb container. https://github.com/lando/lando/blob/master/experimental/plugins/lando-platformsh/services/platformsh-mariadb/builder.js#L35)

however when you add the same to the postgres service the container cannot restart. I suspect this is because, or at least is partially because the postgres service runs as postgres instead of the usual app user and the persistent /mnt/data point has special permission requirements.

Steps to get rolling:

  1. lando init the lando-d8 example
  2. Before you lando start, swap out mysql for postgresql in the services.yaml
db:
    type: postgresql:11
    disk: 2048

cache:
    type: redis:5.0

And also make sure to add the pdo_pgsql extension to your .platform.app.yaml and modify the relationship.

relationships:
    database: 'db:postgresql'
runtime:
    extensions:
        - redis
        - pdo_pgsql
  1. lando start
  2. Install drupal

Now should be easy to see that the data does not persist

  1. lando rebuild -> back to drupal install screen
  2. Might be worth lando ssh -s db -u root and investigating the data directory at /mnt/data

Now try with the data directory mounted for persistence

  1. Uncomment the data persistence for the postgres recipe https://github.com/lando/lando/blob/master/experimental/plugins/lando-platformsh/services/platformsh-postgresql/builder.js#L34
  2. lando destroy && lando start
  3. The service appears to start correctly but if you lando ssh -s db -u root and top you can see that postgres is not running. Manually invoking /etc/platform/start is revealing

root@6b66dfb7c5c4:/app# /etc/platform/start
2020-06-02 13:10:54,282 platformsh.agent DEBUG Running: /etc/platform/start
2020-06-02 13:10:55,162 platformsh.agent ERROR Error in service config.py:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/platformsh/agent/__init__.py", line 28, in log_and_load
    yield load()
  File "/etc/platform/start", line 6, in <module>
    service.start()
  File "/etc/platform/config.py", line 324, in start
    version_string, pgdata_path
  File "/etc/platform/config.py", line 291, in maybe_initdb
    self.initdb(pg.cmd("initdb"), pgdata_path)
  File "/etc/platform/config.py", line 126, in initdb
    os.mkdir(pgdata_path)
OSError: [Errno 17] File exists: '/mnt/data'
Traceback (most recent call last):
  File "/etc/platform/start", line 6, in <module>
    service.start()
  File "/etc/platform/config.py", line 324, in start
    version_string, pgdata_path
  File "/etc/platform/config.py", line 291, in maybe_initdb
    self.initdb(pg.cmd("initdb"), pgdata_path)
  File "/etc/platform/config.py", line 126, in initdb
    os.mkdir(pgdata_path)
OSError: [Errno 17] File exists: '/mnt/data'```
mikemilano commented 4 years ago

I think this may be the solution. Docker does not support setting ownership on a volume mount and it seems to be a common issue with Postgres containers.

Other solutions included Dockerfile modifications, but since we do not have access to modify it, this method uses a single command container to set permissions on the volume.

pirog commented 4 years ago

@mikemilano as an easy way to fix this you may be able to set the LANDO_RESET_DIR envvars on the postgres service to the volume mounted data dir.

This tells lando to set ownership of LANDO_RESET_DIR to LANDO_WEBROOT_USER https://github.com/lando/lando/blob/master/plugins/lando-core/scripts/user-perms.sh#L104

If that doesnt work, we may want to consider adding a special script to postgres that does the things we need eg permissions setting before the primary process is started up. Lando will run any scripts it finds in /scripts before it hands off to the docker CMD. So it would only be a matter of volume mounting that script into /scripts/postgres-perms. https://github.com/lando/lando/blob/master/plugins/lando-core/scripts/lando-entrypoint.sh#L54

mikemilano commented 4 years ago

@pirog thanks, I tried both methods and the env var one seems to achieve the result we need. I'm not sure the script in the 2nd method was executing because the /mnt/data directory remained owned by root.

The env var method set /mnt/data to be owned by postgres:dialout, which is what we need.

The postgres service however is still failing. I'm still chasing it down but if that rings any bells let me know.

db_1     | lando 13:25:34.74 INFO  ==> Lando handing off to: exec init
db_1     | runsv idmapd: fatal: unable to lock supervise/lock: temporary failure
db_1     | runsv postgresql: fatal: unable to lock supervise/lock: temporary failure
pirog commented 4 years ago

Nice!

I've seen other services report those temporary failures from time to time but IIRC the service still booted correctly so its possible the problem lies elsewhere.

I've found lando ssh -s service -u root and then running /etc/platform/start to be useful for troubleshooting why things dont start.

mikemilano commented 4 years ago

Should that be able to run in an already setup container? It doesn't like the mount directory being there.

v# /etc/platform/start 
2020-06-03 14:15:15,981 platformsh.agent DEBUG Running: /etc/platform/start
2020-06-03 14:15:16,492 platformsh.agent ERROR Error in service config.py:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/platformsh/agent/__init__.py", line 28, in log_and_load
    yield load()
  File "/etc/platform/start", line 6, in <module>
    service.start()
  File "/etc/platform/config.py", line 324, in start
    version_string, pgdata_path
  File "/etc/platform/config.py", line 291, in maybe_initdb
    self.initdb(pg.cmd("initdb"), pgdata_path)
  File "/etc/platform/config.py", line 126, in initdb
    os.mkdir(pgdata_path)
OSError: [Errno 17] File exists: '/mnt/data'
Traceback (most recent call last):
  File "/etc/platform/start", line 6, in <module>
    service.start()
  File "/etc/platform/config.py", line 324, in start
    version_string, pgdata_path
  File "/etc/platform/config.py", line 291, in maybe_initdb
    self.initdb(pg.cmd("initdb"), pgdata_path)
  File "/etc/platform/config.py", line 126, in initdb
    os.mkdir(pgdata_path)
OSError: [Errno 17] File exists: '/mnt/data'
pirog commented 4 years ago

@mikemilano good question and good point.

Given that postgresql seems to start when we DONT volume mount /mnt/data but fails to start when we do it does seem like /mnt/data has something to do with our problem.

However, its definitely possible that the error from manually invoking /etc/platform/start after the service starts shows up for the reasons you say and would show up regardless of whether we mounted /mnt/data or not.

I've got a fresh install and can try to see what happens when you run that command when you DONT mount /mnt/data. Stand by

pirog commented 4 years ago

Ok so this is what /etc/platform/start reports if you run it after the container starts and if you DONT mount /mnt/data

/etc/platform/start
2020-06-03 15:13:03,509 platformsh.agent DEBUG Running: /etc/platform/start
2020-06-03 15:13:04,066 platformsh.agent.service DEBUG Waiting for services to come online...
2020-06-03 15:13:04,069 platformsh.agent.service DEBUG All services are online.
2020-06-03 15:13:04,069 platformsh.agent.service INFO Initializing service...
2020-06-03 15:13:04,070 platformsh.agent.service INFO Initializing database privileges
2020-06-03 15:13:04,083 platformsh.agent.service INFO Initialization complete
2020-06-03 15:13:04,084 platformsh.agent.service INFO Bootstrap complete.
2020-06-03 15:13:04,085 gevent_jsonrpc DEBUG <RpcConnection(140715504640144)>: --> {"params": {}, "jsonrpc": "2.0", "method": "notify", "id": 1}...
2020-06-03 15:13:04,086 gevent_jsonrpc DEBUG <RpcConnection(140715504640144)>: <-- {"jsonrpc": "2.0", "result": true, "id": 1}...
2020-06-03 15:13:04,086 platformsh.agent DEBUG Finished: /etc/platform/start
mikemilano commented 4 years ago

@pirog I retrieved the platform.sh python scripts and noticed they were conducting renames on /mnt/data. This was a problem since that is the root of the volume mount.

Since there are no other mounts in this container, it was safe to set the mount target of the volume to /mnt.

With that, the Drupal install worked and persisted through a lando rebuild.

PR lando/lando#2330

pirog commented 4 years ago

Ahh very nice. Good thinking.

Do you think we should be mounting /mnt instead of /mnt/data for the other services we want to persist eg mysql or should /mnt just be an exception for postgres?

mikemilano commented 4 years ago

@pirog I had the same concern but the rename logic only appears to be executed for postgres. I think it's safe to keep this as the exception, but if we did want to make this change to other services, we'd have to make sure there's only 1 mount attached to that path. Perhaps you know that already.

pirog commented 4 years ago

@mikemilano makes sense. exception it is.