langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
49.73k stars 7.11k forks source link

A race condition between `dify-api` and `dify-worker` #6741

Closed CXwudi closed 3 months ago

CXwudi commented 3 months ago

Self Checks

Dify version

0.6.15

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When launching a fresh new instance of Dify, and you specify restart: none instead of restart: always. Then either dify-api or dify-worker crashes with an error:

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pg_type_typname_nsp_index"
Key (typname, typnamespace)=(alembic_version, 2200) already exists.

✔️ Expected Behavior

The Dify instance should launch successfully.

❌ Actual Behavior

Either dify-api or dify-worker will crash upon starting:

Unfold to check the log ``` 2024-07-27 14:15:50 None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. 2024-07-27 14:15:57 INFO [alembic.runtime.migration] Context impl PostgresqlImpl. 2024-07-27 14:15:57 INFO [alembic.runtime.migration] Will assume transactional DDL. 2024-07-27 14:15:58 Traceback (most recent call last): 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context 2024-07-27 14:15:48 Running migrations 2024-07-27 14:15:58 self.dialect.do_execute( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute 2024-07-27 14:15:58 cursor.execute(statement, parameters) 2024-07-27 14:15:58 psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "pg_type_typname_nsp_index" 2024-07-27 14:15:58 DETAIL: Key (typname, typnamespace)=(alembic_version, 2200) already exists. 2024-07-27 14:15:58 2024-07-27 14:15:58 2024-07-27 14:15:58 The above exception was the direct cause of the following exception: 2024-07-27 14:15:58 2024-07-27 14:15:58 Traceback (most recent call last): 2024-07-27 14:15:58 File "/usr/local/bin/flask", line 8, in 2024-07-27 14:15:58 sys.exit(main()) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 1105, in main 2024-07-27 14:15:58 cli.main() 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main 2024-07-27 14:15:58 rv = self.invoke(ctx) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke 2024-07-27 14:15:58 return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke 2024-07-27 14:15:58 return _process_result(sub_ctx.command.invoke(sub_ctx)) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke 2024-07-27 14:15:58 return ctx.invoke(self.callback, **ctx.params) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke 2024-07-27 14:15:58 return __callback(*args, **kwargs) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func 2024-07-27 14:15:58 return f(get_current_context(), *args, **kwargs) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 386, in decorator 2024-07-27 14:15:58 return ctx.invoke(f, *args, **kwargs) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke 2024-07-27 14:15:58 return __callback(*args, **kwargs) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/flask_migrate/cli.py", line 154, in upgrade 2024-07-27 14:15:58 _upgrade(directory, revision, sql, tag, x_arg) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/flask_migrate/__init__.py", line 111, in wrapped 2024-07-27 14:15:58 f(*args, **kwargs) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/flask_migrate/__init__.py", line 200, in upgrade 2024-07-27 14:15:58 command.upgrade(config, revision, sql=sql, tag=tag) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/command.py", line 403, in upgrade 2024-07-27 14:15:58 script.run_env() 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/script/base.py", line 583, in run_env 2024-07-27 14:15:58 util.load_python_file(self.dir, "env.py") 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 95, in load_python_file 2024-07-27 14:15:58 module = load_module_py(module_id, path) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 113, in load_module_py 2024-07-27 14:15:58 spec.loader.exec_module(module) # type: ignore 2024-07-27 14:15:58 File "", line 883, in exec_module 2024-07-27 14:15:58 File "", line 241, in _call_with_frames_removed 2024-07-27 14:15:58 File "/app/api/migrations/env.py", line 112, in 2024-07-27 14:15:58 run_migrations_online() 2024-07-27 14:15:58 File "/app/api/migrations/env.py", line 106, in run_migrations_online 2024-07-27 14:15:58 context.run_migrations() 2024-07-27 14:15:58 File "", line 8, in run_migrations 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/runtime/environment.py", line 948, in run_migrations 2024-07-27 14:15:58 self.get_context().run_migrations(**kw) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/runtime/migration.py", line 610, in run_migrations 2024-07-27 14:15:58 self._ensure_version_table() 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/alembic/runtime/migration.py", line 548, in _ensure_version_table 2024-07-27 14:15:58 self._version.create(self.connection, checkfirst=True) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/schema.py", line 1288, in create 2024-07-27 14:15:58 bind._run_ddl_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2457, in _run_ddl_visitor 2024-07-27 14:15:58 visitorcallable(self.dialect, self, **kwargs).traverse_single(element) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/visitors.py", line 664, in traverse_single 2024-07-27 14:15:58 return meth(obj, **kw) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 956, in visit_table 2024-07-27 14:15:58 )._invoke_with(self.connection) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 314, in _invoke_with 2024-07-27 14:15:58 return bind.execute(self) 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1418, in execute 2024-07-27 14:15:58 return meth( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/ddl.py", line 180, in _execute_on_connection 2024-07-27 14:15:58 return connection._execute_ddl( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1529, in _execute_ddl 2024-07-27 14:15:58 ret = self._execute_context( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context 2024-07-27 14:15:58 return self._exec_single_context( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context 2024-07-27 14:15:58 self._handle_dbapi_exception( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2353, in _handle_dbapi_exception 2024-07-27 14:15:58 raise sqlalchemy_exception.with_traceback(exc_info[2]) from e 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context 2024-07-27 14:15:58 self.dialect.do_execute( 2024-07-27 14:15:58 File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute 2024-07-27 14:15:58 cursor.execute(statement, parameters) 2024-07-27 14:15:58 sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pg_type_typname_nsp_index" 2024-07-27 14:15:58 DETAIL: Key (typname, typnamespace)=(alembic_version, 2200) already exists. 2024-07-27 14:15:58 2024-07-27 14:15:58 [SQL: 2024-07-27 14:15:58 CREATE TABLE alembic_version ( 2024-07-27 14:15:58 version_num VARCHAR(32) NOT NULL, 2024-07-27 14:15:58 CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num) 2024-07-27 14:15:58 ) 2024-07-27 14:15:58 2024-07-27 14:15:58 ] 2024-07-27 14:15:58 (Background on this error at: https://sqlalche.me/e/20/gkpj) ```

The workaround is simply just to set restart: always or restart: unless-stopped.

I noticed this issue is not a big deal. But I'd to keep this issue here for anyone who tried restart: none

dosubot[bot] commented 3 months ago

Hi @CXwudi, I'm here to help you with any bugs, questions, or contributions you have for the Dify project. Let me know how I can assist you while waiting for the human maintainer.

To resolve the race condition between dify-api and dify-worker when launching a fresh new instance of Dify version 0.6.15, you can ensure that the database migrations are completed before the dify-worker starts. This can be achieved by setting the MIGRATION_ENABLED environment variable to true and ensuring that the dify-api service completes its startup process before the dify-worker service begins.

Here is a modified version of your docker-compose.yaml file to include a dependency on the api service for the worker service, along with the necessary configurations for the database and Redis services:

version: '3'
services:
  # The postgres database.
  db:
    image: postgres:15-alpine
    restart: always
    environment:
      POSTGRES_PASSWORD: difyai123456
      POSTGRES_DB: dify
      PGDATA: /var/lib/postgresql/data/pgdata
    volumes:
      - ./volumes/db/data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  # The redis cache.
  redis:
    image: redis:6-alpine
    restart: always
    volumes:
      - ./volumes/redis/data:/data
    command: redis-server --requirepass difyai123456
    ports:
      - "6379:6379"

  # API service
  api:
    image: langgenius/dify-api:0.6.15
    restart: always
    environment:
      MODE: api
      LOG_LEVEL: INFO
      SECRET_KEY: sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U
      CONSOLE_WEB_URL: ''
      INIT_PASSWORD: ''
      CONSOLE_API_URL: ''
      SERVICE_API_URL: ''
      APP_WEB_URL: ''
      FILES_URL: ''
      FILES_ACCESS_TIMEOUT: 300
      MIGRATION_ENABLED: 'true'
      DB_USERNAME: postgres
      DB_PASSWORD: difyai123456
      DB_HOST: db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: difyai123456
      REDIS_USE_SSL: 'false'
      REDIS_DB: 0
      CELERY_BROKER_URL: redis://:difyai123456@redis:6379/1
      WEB_API_CORS_ALLOW_ORIGINS: '*'
      CONSOLE_CORS_ALLOW_ORIGINS: '*'

  # Worker service
  worker:
    image: langgenius/dify-api:0.6.15
    restart: always
    depends_on:
      - api
    environment:
      MODE: worker
      CONSOLE_WEB_URL: ''
      DB_USERNAME: postgres
      DB_PASSWORD: difyai123456
      DB_HOST: db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: difyai123456
      REDIS_USE_SSL: 'false'
      REDIS_DB: 0
      CELERY_BROKER_URL: redis://:difyai123456@redis:6379/1
      WEB_API_CORS_ALLOW_ORIGINS: '*'
      CONSOLE_CORS_ALLOW_ORIGINS: '*'

By adding depends_on: - api to the worker service, Docker Compose will ensure that the api service is started before the worker service, reducing the likelihood of a race condition causing a sqlalchemy.exc.IntegrityError. Additionally, the configurations for the database and Redis services are included to ensure they are properly set up. The MIGRATION_ENABLED environment variable ensures that migrations are executed prior to the application startup and the application starts after the migrations have completed [1].

To continue talking to Dosu, mention @dosu.

CXwudi commented 3 months ago

Actually just realized I am using an old version of dify. I thought I already pulled the latest image. Closing now

CXwudi commented 3 months ago

Or you can also do:

  api:
    image: langgenius/dify-api:0.6.15
    restart: always
    depends_on:
      - worker # let api and worker run one-by-one

In case if you saw this issue again