pellepelster / solidblocks

Solidblocks is a library of reusable components for infrastructure operation, automation and developer experience
https://pellepelster.github.io/solidblocks/
MIT License
25 stars 5 forks source link

Multiline DB_POSTGRES_EXTRA_CONFIG results in invalid config #41

Closed JonasHiltl closed 1 month ago

JonasHiltl commented 1 month ago

I want to define multiple extra configs for the postgresql.conf through the DB_POSTGRES_EXTRA_CONFIG environment variable. But when using a multiline yaml string the line breaks are missing in the resulting postgresql.conf thus the config is invalid and Postgres crashes with following error:

LOG:  syntax error in file "/storage/data/ocl/15/postgresql.conf" line 17, near token "\"

This is the extra config I want to apply:

environment:
  DB_POSTGRES_EXTRA_CONFIG: |
    max_connections = 200
    shared_buffers = 2GB
    effective_cache_size = 6GB
    maintenance_work_mem = 512MB
    checkpoint_completion_target = 0.9
    wal_buffers = 16MB
    default_statistics_target = 100
    random_page_cost = 1.1
    effective_io_concurrency = 200
    work_mem = 5242kB
    huge_pages = off
    min_wal_size = 1GB
    max_wal_size = 4GB

This is the resulting postgresql.conf


archive_command = 'pgbackrest --config /rds/config/pgbackrest.conf --log-level-console=info --log-path=/rds/log --stanza=ocl archive-push  %p'
archive_mode = on

max_wal_senders = 3
wal_level = replica

listen_addresses = '*'
unix_socket_directories =  '/rds/socket'
log_destination = 'stderr'
logging_collector = 'off'
log_line_prefix = '%h %m [%p] %q%u@%d '

max_connections = 200\nshared_buffers = 2GB\neffective_cache_size = 6GB\nmaintenance_work_mem = 512MB\ncheckpoint_completion_target = 0.9\nwal_buffers = 16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency = 200\nwork_mem = 5242kB\nhuge_pages = off\nmin_wal_size = 1GB\nmax_wal_size = 4GB\n
pellepelster commented 1 month ago

can you provide some more context where this happens? based on your input I tried to create an reproduce using the following docker compose config:

version: '3.5'
services:
  change-vol-ownership:
    image: ubuntu
    user: "root"
    volumes:
      - data:/storage/data
      - backup:/storage/backup
    command: chown -R 10000:10000 /storage

  postgresql:
    image: ghcr.io/pellepelster/solidblocks-rds-postgresql:15-snapshot-rc
    environment:
      DB_INSTANCE_NAME: instance1
      DB_BACKUP_LOCAL: 1
      DB_POSTGRES_EXTRA_CONFIG: |
        max_connections = 200
        shared_buffers = 2GB
        effective_cache_size = 6GB
        maintenance_work_mem = 512MB
        checkpoint_completion_target = 0.9
        wal_buffers = 16MB
        default_statistics_target = 100
        random_page_cost = 1.1
        effective_io_concurrency = 200
        work_mem = 5242kB
        huge_pages = off
        min_wal_size = 1GB
        max_wal_size = 4GB
    volumes:
      - data:/storage/data
      - backup:/storage/backup
    ports:
      - "5432"
    depends_on:
      change-vol-ownership:
        condition: service_completed_successfully

volumes:
  data:
  backup:

which boots up fine and results in the following postgres config file:

archive_command = 'pgbackrest --config /rds/config/pgbackrest.conf --log-level-console=info --log-path=/rds/log --stanza=instance1 archive-push  %p'
archive_mode = on

max_wal_senders = 3
wal_level = replica

listen_addresses = '*'
unix_socket_directories =  '/rds/socket'
log_destination = 'stderr'
logging_collector = 'off'
log_line_prefix = '%h %m [%p] %q%u@%d '

max_connections = 200
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 5242kB
huge_pages = off
min_wal_size = 1GB
max_wal_size = 4GB
JonasHiltl commented 1 month ago

I'm using kamal to deploy postgres-rds. If it works with docker-compose then it's probably an issue with kamal not formatting the environment variable correctly when starting the docker container.

This is an extract of my Kamal config:

accessories:
  rds:
    image: ghcr.io/pellepelster/solidblocks-rds-postgresql:15-v0.2.6
    host: accessories
    port: 5432
    env:
      clear:
        ...
        DB_POSTGRES_EXTRA_CONFIG: |
          max_connections = 200
          shared_buffers = 2GB
          effective_cache_size = 6GB
          maintenance_work_mem = 512MB
          checkpoint_completion_target = 0.9
          wal_buffers = 16MB
          default_statistics_target = 100
          random_page_cost = 1.1
          effective_io_concurrency = 200
          work_mem = 5242kB
          huge_pages = off
          min_wal_size = 1GB
          max_wal_size = 4GB
      secret:
        - DB_BACKUP_S3_HOST
        - DB_BACKUP_S3_ACCESS_KEY
        - DB_BACKUP_S3_SECRET_KEY
        - DB_ADMIN_PASSWORD
        - DB_PASSWORD_db1
        - DB_PASSWORD_db2
    directories:
      - /rds/data:/storage/data

I will do some more investigation on my end to see where the error lies.

JonasHiltl commented 1 month ago

Kamal is using docker run to deploy a new container, so the command it's running to deploy Postgres is:

docker run ... --env DB_POSTGRES_EXTRA_CONFIG="max_connections = 200\nshared_buffers = 2GB\neffective_cache_size = 6GB\nmaintenance_work_mem = 512MB\ncheckpoint_completion_target = 0.9\nwal_buffers = 16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency = 200\nwork_mem = 5242kB\nhuge_pages = off\nmin_wal_size = 1GB\nmax_wal_size = 4GB\n"

But it looks like{{ env.Getenv "DB_POSTGRES_EXTRA_CONFIG" "" }} escapes the string in the template and ignores the new line symbols. What do you think about adjusting the template so that the env variable is no longer escaped and actually written on separate lines?

pellepelster commented 1 month ago

Seems like a good idea, see also https://github.com/pellepelster/solidblocks/commit/f162aa5096bbbeaec66d496d3277d2a57c05708e could you give ghcr.io/pellepelster/solidblocks-rds-postgresql:15-2074fdd-rc a try

JonasHiltl commented 1 month ago

I tried the new image but the config still looks like this:

...
max_connections = 200\nshared_buffers = 2GB\neffective_cache_size = 6GB\nmaintenance_work_mem = 512MB\ncheckpoint_completion_target = 0.9\nwal_buffers = 16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency = 200\nwork_mem = 5242kB\nhuge_pages = off\nmin_wal_size = 1GB\nmax_wal_size = 4GB\n

So Postgres still complains about a wrong config format. I'm not sure why since the commit looks good to me.

This is the full Docker command to run postgres-rds

docker run --name ocloud-rds --detach --restart unless-stopped --log-opt max-size="10m" --publish 5432:5432 --env-file .kamal/env/accessories/ocloud-rds.env --env DB_INSTANCE_NAME="ocl" --env DB_BACKUP_S3="1" --env DB_BACKUP_S3_URI_STYLE="path" --env DB_BACKUP_S3_REGION="us-east-1" --env DB_BACKUP_S3_BUCKET="rds-backup" --env DB_DATABASE_db1="keycloak" --env DB_USERNAME_db1="keycloak" --env DB_DATABASE_db2="cloud" --env DB_USERNAME_db2="ocl" --env DB_POSTGRES_EXTRA_CONFIG="max_connections = 200\nshared_buffers = 2GB\neffective_cache_size = 6GB\nmaintenance_work_mem = 512MB\ncheckpoint_completion_target = 0.9\nwal_buffers = 16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency = 200\nwork_mem = 5242kB\nhuge_pages = off\nmin_wal_size = 1GB\nmax_wal_size = 4GB\n" --volume /rds/data:/storage/data --label service="ocloud-rds" ghcr.io/pellepelster/solidblocks-rds-postgresql:15-2074fdd-rc

Could you try adding my DB_POSTGRES_EXTRA_CONFIG string to the tests and see if it's still passing?

Locally even

export DB_POSTGRES_EXTRA_CONFIG="max_connections = 200\nshared_buffers = 2GB\neffective_cache_size = 6GB\n"
gomplate -i '{{ env.Getenv "DB_POSTGRES_EXTRA_CONFIG" | strings.ReplaceAll "\\\\n" "\\n" }}'

is working so I have no idea why it's not working when starting the docker container.

pellepelster commented 1 month ago

I tried it with your specific testcase and it worked

DB_POSTGRES_EXTRA_CONFIG="max_connections = 200\nshared_buffers = 2GB\neffective_cache_size = 6GB\nmaintenance_work_mem = 512MB\ncheckpoint_completion_target = 0.9\nwal_buffers = 16MB\ndefault_statistics_target = 100\nrandom_page_cost = 1.1\neffective_io_concurrency = 200\nwork_mem = 5242kB\nhuge_pages = off\nmin_wal_size = 1GB\nmax_wal_size = 4GB\n

could you provide your kamal config? (without any secrets of course)

JonasHiltl commented 1 month ago

This is my kamal config, you can ignore the Keycloak service. The secrets are loaded from a .env file.

service: ocloud

image: jonashiltl/ocloud-kamal

# Credentials for your image host.
registry:
  # Specify the registry server, if you're not using Docker Hub
  server: ghcr.io
  username: jonashiltl

  # Always use an access token rather than real password when possible.
  password:
    - KAMAL_REGISTRY_PASSWORD

servers:
  web: 
    hosts:
      - web
    options:
      network: kamal

# Use a different ssh user than root
ssh:
  user: kamal

# Configure builder setup.
builder:
  context: .

accessories:
  rds:
    image: ghcr.io/pellepelster/solidblocks-rds-postgresql:15-2074fdd-rc
    host: accessories
    port: 5432
    env:
      clear:
        DB_INSTANCE_NAME: ocl
        DB_BACKUP_S3: 1
        DB_BACKUP_S3_URI_STYLE: path
        DB_BACKUP_S3_REGION: us-east-1
        DB_BACKUP_S3_BUCKET: rds-backup
        DB_DATABASE_db1: keycloak
        DB_USERNAME_db1: keycloak
        DB_POSTGRES_EXTRA_CONFIG: |
          max_connections = 200
          shared_buffers = 2GB
          effective_cache_size = 6GB
          maintenance_work_mem = 512MB
          checkpoint_completion_target = 0.9
          wal_buffers = 16MB
          default_statistics_target = 100
          random_page_cost = 1.1
          effective_io_concurrency = 200
          work_mem = 5242kB
          huge_pages = off
          min_wal_size = 1GB
          max_wal_size = 4GB
      secret:
        - DB_BACKUP_S3_HOST
        - DB_BACKUP_S3_ACCESS_KEY
        - DB_BACKUP_S3_SECRET_KEY
        - DB_ADMIN_PASSWORD
        - DB_PASSWORD_db1
    directories:
      - /rds/data:/storage/data

  keycloak:
    image: ghcr.io/jonashiltl/keycloak:0.0.2
    host: web
    port: 8080
    env:
      clear: 
        PROXY_ADDRESS_FORWARDING: 'true'
        KEYCLOAK_PRODUCTION: true
        KC_HOSTNAME_URL: https://auth.openchangelog.com
        KC_HOSTNAME_ADMIN_URL: https://auth.openchangelog.com
        KC_DB: postgres
        KC_DB_URL_HOST: 10.0.0.3
        KC_DB_URL_DATABASE: keycloak
        KC_DB_SCHEMA: public
        KC_DB_USERNAME: keycloak
        KC_LOG_LEVEL: INFO
        KC_HOSTNAME_PATH: /
        KC_HOSTNAME_STRICT: 'false'
        KC_HTTP_ENABLED: 'true'
        KC_PROXY: 'edge'
        KC_PROXY_HEADERS: 'xforwarded'
      secret:
        - KEYCLOAK_ADMIN
        - KEYCLOAK_ADMIN_PASSWORD
        - KC_DB_PASSWORD
    options:
      network: kamal

# Configure custom arguments for Traefik. Be sure to reboot traefik when you modify it.
traefik:
  options:
    publish:
      - "443:443"
      - "8083:8083"
    volume:
      - "/certs/acme.json:/certs/acme.json"
    network: kamal
  args:
    #api.dashboard: true
    entryPoints.web.address: ":80"
    entryPoints.websecure.address: ":443"
    entryPoints.web.http.redirections.entryPoint.to: websecure
    entryPoints.web.http.redirections.entryPoint.scheme: https
    entryPoints.web.http.redirections.entrypoint.permanent: true
  env:
    secret:
      - CF_DNS_API_TOKEN

# Configure a custom healthcheck (default is /up on port 3000)
healthcheck:
  path: /health
  port: 6002
pellepelster commented 1 month ago

now I can reproduce the issue, it seems like Kamal escapes the newlines \\n instead of \n, on the deployed server I can see the following:

DB_POSTGRES_EXTRA_CONFIG=max_connections = 200\\nshared_buffers = 2GB\\neffective_cache_size = 6GB\\nmaintenance_work_mem = 512MB\\ncheckpoint_completion_target = 0.9\\nwal_buffers = 16MB\\ndefault_statistics_target = 100\\nrandom_page_cost = 1.1\\neffective_io_concurrency = 200\\nwork_mem = 5242kB\\nhuge_pages = off\\nmin_wal_size = 1GB\\nmax_wal_size = 4GB\\n

which feels a little bit like a Kamal bug. I will try to come up with a decent workaround

pellepelster commented 1 month ago

Ok that was a little bit silly, I messed up the escaping in my testcase leading to a fix fixing a non-existent scenario.

With https://github.com/pellepelster/solidblocks/commit/a9aa3b5fae6b81f95ad5dbf3a6988cfb353288d7 it should work, I was able to successfully deploy ghcr.io/pellepelster/solidblocks-rds-postgresql:15-c1eca32-rc with kamal

JonasHiltl commented 1 month ago

ghcr.io/pellepelster/solidblocks-rds-postgresql:15-c1eca32-rc is also working for me. Thanks for taking the time and looking into it.

pellepelster commented 1 month ago

great, there will be a maintenance release end of next week that will contain the fix