twentyhq / twenty

Building a modern alternative to Salesforce, powered by the community.
https://twenty.com
GNU Affero General Public License v3.0
15.05k stars 1.48k forks source link

Starting via Docker Compose fails with "twenty-db is unhealthy" #6140

Open prolix-oc opened 6 days ago

prolix-oc commented 6 days ago

Bug Description

Using the Docker Compose file provided, running Twenty for the first time is impossible due to the db service always coming back as unhealthy. Since this is a hard requirement for the other services to run, it will not run the stack.

Example:

I can't really provide an example of the app when it's not launching, but here is my Compose file. I am using Portainer Stacks, the provided example docker-compose.yml with some modifications to fit our setup, and then loading the envs using Portainer's stack.env file

version: "3.8"
name: twenty

services:
  change-vol-ownership:
    image: ubuntu
    user: root
    volumes:
      - /mnt/App_Pool/config/twenty/local:/tmp/server-local-data
      - /mnt/App_Pool/config/twenty/data:/tmp/docker-data
    command: >
      bash -c "
      chown -R 1000:1000 /tmp/server-local-data
      && chown -R 1000:1000 /tmp/docker-data"

  server:
    image: twentycrm/twenty:${TAG}
    volumes:
      - /mnt/App_Pool/config/twenty/local:/app/packages/twenty-server/${STORAGE_LOCAL_PATH:-.local-storage}
      - /mnt/App_Pool/config/twenty/data:/app/docker-data
    networks:
      sub:
      inv-br:
    env_file: stack.env
    environment:
      PORT: 3000
      PG_DATABASE_URL: postgres://twenty:twenty@${PG_DATABASE_HOST}/default
      SERVER_URL: ${SERVER_URL}
      FRONT_BASE_URL: ${FRONT_BASE_URL:-$SERVER_URL}
      MESSAGE_QUEUE_TYPE: ${MESSAGE_QUEUE_TYPE}

      ENABLE_DB_MIGRATIONS: "true"

      SIGN_IN_PREFILLED: ${SIGN_IN_PREFILLED}
      STORAGE_TYPE: ${STORAGE_TYPE}
      STORAGE_S3_REGION: ${STORAGE_S3_REGION}
      STORAGE_S3_NAME: ${STORAGE_S3_NAME}
      STORAGE_S3_ENDPOINT: ${STORAGE_S3_ENDPOINT}
      ACCESS_TOKEN_SECRET: ${ACCESS_TOKEN_SECRET}
      LOGIN_TOKEN_SECRET: ${LOGIN_TOKEN_SECRET}
      REFRESH_TOKEN_SECRET: ${REFRESH_TOKEN_SECRET}
      FILE_TOKEN_SECRET: ${FILE_TOKEN_SECRET}
    depends_on:
      change-vol-ownership:
        condition: service_completed_successfully
      db:
        condition: service_healthy
    healthcheck:
      test: curl --fail http://localhost:3000/healthz
      interval: 5s
      timeout: 5s
      retries: 10
    restart: always

  worker:
    image: twentycrm/twenty:${TAG}
    command: ["yarn", "worker:prod"]
    networks:
      inv-br:
    env_file: stack.env
    environment:
      PG_DATABASE_URL: postgres://twenty:twenty@${PG_DATABASE_HOST}/default
      SERVER_URL: ${SERVER_URL}
      FRONT_BASE_URL: ${FRONT_BASE_URL:-$SERVER_URL}
      MESSAGE_QUEUE_TYPE: ${MESSAGE_QUEUE_TYPE}
      ENABLE_DB_MIGRATIONS: "false"
      STORAGE_TYPE: ${STORAGE_TYPE}
      STORAGE_S3_REGION: ${STORAGE_S3_REGION}
      STORAGE_S3_NAME: ${STORAGE_S3_NAME}
      STORAGE_S3_ENDPOINT: ${STORAGE_S3_ENDPOINT}
      ACCESS_TOKEN_SECRET: ${ACCESS_TOKEN_SECRET}
      LOGIN_TOKEN_SECRET: ${LOGIN_TOKEN_SECRET}
      REFRESH_TOKEN_SECRET: ${REFRESH_TOKEN_SECRET}
      FILE_TOKEN_SECRET: ${FILE_TOKEN_SECRET}
    depends_on:
      db:
        condition: service_healthy
      server:
        condition: service_healthy
    restart: always

  db:
    image: twentycrm/twenty-postgres:${TAG}
    volumes:
      - db-data:/bitnami/postgresql
    networks:
      inv-br:
    env_file: stack.env
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_ADMIN_PASSWORD}
    healthcheck:
      test: pg_isready -U twenty -d default
      interval: 5s
      timeout: 5s
      retries: 10
    restart: always

volumes:
  db-data:

networks:
  sub:
    name: subnet
    external: true
  inv-br:
    name: inv_br
    external: true

stack.env file:

TAG=latest
PG_DATABASE_HOST=db:5432
SERVER_URL=https://twenty.example.com
SIGN_IN_PREFILLED=true
STORAGE_TYPE=local
MESSAGE_QUEUE_TYPE=pg-boss
ACCESS_TOKEN_SECRET=UcejdastWxtrcu7...
LOGIN_TOKEN_SECRET=MN21tMqjxudZSr5KECR...
REFRESH_TOKEN_SECRET=gtpiEIAR+Inh/GVoKI...
FILE_TOKEN_SECRET=BADHHWVS5P0ZVEcN+...
POSTGRES_PASSWORD=...

Expected behavior

The stack should launch, and Twenty should be operational with healthy containers since there has been zero prior manipulation of such containers.

Example:

Again... I don't have a reference for a working copy of Twenty.

Technical inputs

This is not a resource issue, as this is being run on a production server with 36 cores, 192GB of RAM, and 100Gbit networking. I've got plenty of resources to run 3 simple containers.

If there's a hard-coded requirement for a health check, perhaps we address the health checks by hostnames only or approach the health-checks without using networking. Some of us use macvlan networks to route container traffic, and they should be considered when offering a business-class software from a security standpoint alone. I find it hard to believe a lumbering beast like Authentik would cope with my interlinked macvlan/internal bridge setup, and not this.

If that's not the case, then I'm honestly stumped. I used a volume bind for the db's data directory to a path on our ZFS app pool as well, with no luck. I removed that mapping as a test.

greptile-apps[bot] commented 6 days ago

To resolve the issue with the twenty-db service being unhealthy, follow these steps:

  1. Check Healthcheck Command: Ensure the healthcheck command in the docker-compose.yml file is correct. It should be:

    healthcheck:
    test: pg_isready -U twenty -d default
    interval: 5s
    timeout: 5s
    retries: 10
  2. Database Initialization: Verify that the database is properly initialized using the init.sql script. Ensure the init.sql script is correctly copied and executed in the Dockerfile:

    COPY ./packages/twenty-docker/twenty-postgres/init.sql /docker-entrypoint-initdb.d/
  3. Environment Variables: Ensure all necessary environment variables are correctly set in the stack.env file, especially POSTGRES_PASSWORD and PG_DATABASE_HOST.

  4. Volume Bindings: Ensure the volume bindings for the database data directory are correctly set and accessible:

    volumes:
    db-data:/bitnami/postgresql
  5. Network Configuration: Ensure the network configuration is correct and the db service can communicate with other services. Check the networks section in the docker-compose.yml file.

If the issue persists, consider increasing the timeout and retries values in the healthcheck configuration to allow more time for the database to become healthy.

References

/packages/twenty-docker/docker-compose.yml /packages/twenty-postgres/docker/docker-compose.yml /packages/twenty-docker/twenty-postgres/Dockerfile /packages/twenty-docker /packages/twenty-postgres

#### About Greptile This response provides a starting point for your research, not a precise solution. Help us improve! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. [Ask Greptile](https://app.greptile.com/chat/github/twentyhq/twenty/main) · [Edit Issue Bot Settings](https://app.greptile.com/apps/github)
aficiomaquinas commented 1 day ago

I think it would be useful to get the logs from the db to further debug the issue. It could work easier if you comment out the worker and the server so that when you redeploy it gives you time to check the logs of the db without the stack auto destroying because of the failed health check.