Closed erigo closed 2 months ago
What's the current setting of this dataset? Like rerank model?
The issue with Celery not producing more logs and the dataset status being stuck on "Queuing" in Dify version 0.7.0 when self-hosted with Docker could be due to the log level configuration and the Celery worker settings.
Log Level Configuration: The log level for Celery might be set to a higher threshold, such as INFO
, which might not capture all the necessary logs. You can increase the verbosity by setting the LOG_LEVEL
to DEBUG
.
Celery Worker Configuration: Ensure that the Celery worker settings are properly configured, including the number of workers and auto-scaling settings.
Here is how you can adjust these settings in your docker-compose.yaml
:
x-shared-env: &shared-api-worker-env
LOG_LEVEL: ${LOG_LEVEL:-DEBUG} # Change INFO to DEBUG to capture more logs
LOG_FILE: ${LOG_FILE:-}
DEBUG: ${DEBUG:-true} # Enable debug mode
FLASK_DEBUG: ${FLASK_DEBUG:-false}
SECRET_KEY: ${SECRET_KEY:-sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U}
INIT_PASSWORD: ${INIT_PASSWORD:-}
CONSOLE_WEB_URL: ${CONSOLE_WEB_URL:-}
CONSOLE_API_URL: ${CONSOLE_API_URL:-}
SERVICE_API_URL: ${SERVICE_API_URL:-}
APP_WEB_URL: ${APP_WEB_URL:-}
CHECK_UPDATE_URL: ${CHECK_UPDATE_URL:-https://updates.dify.ai}
OPENAI_API_BASE: ${OPENAI_API_BASE:-https://api.openai.com/v1}
FILES_URL: ${FILES_URL:-}
FILES_ACCESS_TIMEOUT: ${FILES_ACCESS_TIMEOUT:-300}
APP_MAX_ACTIVE_REQUESTS: ${APP_MAX_ACTIVE_REQUESTS:-0}
MIGRATION_ENABLED: ${MIGRATION_ENABLED:-true}
DEPLOY_ENV: ${DEPLOY_ENV:-PRODUCTION}
DIFY_BIND_ADDRESS: ${DIFY_BIND_ADDRESS:-0.0.0.0}
DIFY_PORT: ${DIFY_PORT:-5001}
SERVER_WORKER_AMOUNT: ${SERVER_WORKER_AMOUNT:-}
SERVER_WORKER_CLASS: ${SERVER_WORKER_CLASS:-}
CELERY_WORKER_CLASS: ${CELERY_WORKER_CLASS:-}
GUNICORN_TIMEOUT: ${GUNICORN_TIMEOUT:-360}
CELERY_WORKER_AMOUNT: ${CELERY_WORKER_AMOUNT:-1} # Ensure this is set appropriately
CELERY_AUTO_SCALE: ${CELERY_AUTO_SCALE:-true} # Enable auto-scaling if needed
CELERY_MAX_WORKERS: ${CELERY_MAX_WORKERS:-10} # Set max workers
CELERY_MIN_WORKERS: ${CELERY_MIN_WORKERS:-1} # Set min workers
API_TOOL_DEFAULT_CONNECT_TIMEOUT: ${API_TOOL_DEFAULT_CONNECT_TIMEOUT:-10}
API_TOOL_DEFAULT_READ_TIMEOUT: ${API_TOOL_DEFAULT_READ_TIMEOUT:-60}
DB_USERNAME: ${DB_USERNAME:-postgres}
DB_PASSWORD: ${DB_PASSWORD:-difyai123456}
DB_HOST: ${DB_HOST:-db}
DB_PORT: ${DB_PORT:-5432}
DB_DATABASE: ${DB_DATABASE:-dify}
SQLALCHEMY_POOL_SIZE: ${SQLALCHEMY_POOL_SIZE:-30}
SQLALCHEMY_POOL_RECYCLE: ${SQLALCHEMY_POOL_RECYCLE:-3600}
SQLALCHEMY_ECHO: ${SQLALCHEMY_ECHO:-false}
REDIS_HOST: ${REDIS_HOST:-redis}
REDIS_PORT: ${REDIS_PORT:-6379}
REDIS_USERNAME: ${REDIS_USERNAME:-}
REDIS_PASSWORD: ${REDIS_PASSWORD:-difyai123456}
REDIS_USE_SSL: ${REDIS_USE_SSL:-false}
REDIS_DB: 0
CELERY_BROKER_URL: ${CELERY_BROKER_URL:-redis://:difyai123456@redis:6379/1}
Additionally, ensure that the entrypoint.sh
script is correctly setting the Celery worker options. Here is the relevant part of the entrypoint.sh
script:
#!/bin/bash
set -e
if [[ "${MIGRATION_ENABLED}" == "true" ]]; then
echo "Running migrations"
flask upgrade-db
fi
if [[ "${MODE}" == "worker" ]]; then
# Get the number of available CPU cores
if [ "${CELERY_AUTO_SCALE,,}" = "true" ]; then
# Set MAX_WORKERS to the number of available cores if not specified
AVAILABLE_CORES=$(nproc)
MAX_WORKERS=${CELERY_MAX_WORKERS:-$AVAILABLE_CORES}
MIN_WORKERS=${CELERY_MIN_WORKERS:-1}
CONCURRENCY_OPTION="--autoscale=${MAX_WORKERS},${MIN_WORKERS}"
else
CONCURRENCY_OPTION="-c ${CELERY_WORKER_AMOUNT:-1}"
fi
exec celery -A app.celery worker -P ${CELERY_WORKER_CLASS:-gevent} $CONCURRENCY_OPTION --loglevel INFO \
-Q ${CELERY_QUEUES:-dataset,generation,mail,ops_trace,app_deletion}
elif [[ "${MODE}" == "beat" ]]; then
exec celery -A app.celery beat --loglevel INFO
else
if [[ "${DEBUG}" == "true" ]]; then
exec flask run --host=${DIFY_BIND_ADDRESS:-0.0.0.0} --port=${DIFY_PORT:-5001} --debug
else
exec gunicorn \
--bind "${DIFY_BIND_ADDRESS:-0.0.0.0}:${DIFY_PORT:-5001}" \
--workers ${SERVER_WORKER_AMOUNT:-1} \
--worker-class ${SERVER_WORKER_CLASS:-gevent} \
--timeout ${GUNICORN_TIMEOUT:-200} \
--preload \
app:app
fi
fi
By setting LOG_LEVEL
to DEBUG
and ensuring that the Celery worker settings are properly configured, you should be able to get more detailed logs and potentially resolve the issue with the dataset status being stuck on "Queuing". Additionally, make sure that the CELERY_WORKER_AMOUNT
is set appropriately, as the default is 1, and adjust other related settings as needed [1][2].
To continue talking to Dosu, mention @dosu.
I guess it's the problem between the embedding model and the worker, seems the worker can not connect to the embedding model, please check this embedding model service again.
I have checked, and the worker can connect to the embedding model.
I know why the Celery worker log level is set to INFO. In the entrypoint.sh script, the CLI does not use environment variables.
The issue with Celery not producing more logs and the dataset status being stuck on "Queuing" in Dify version 0.7.0 when self-hosted with Docker could be due to the log level configuration and the Celery worker settings.
- Log Level Configuration: The log level for Celery might be set to a higher threshold, such as
INFO
, which might not capture all the necessary logs. You can increase the verbosity by setting theLOG_LEVEL
toDEBUG
.- Celery Worker Configuration: Ensure that the Celery worker settings are properly configured, including the number of workers and auto-scaling settings.
Here is how you can adjust these settings in your
docker-compose.yaml
:x-shared-env: &shared-api-worker-env LOG_LEVEL: ${LOG_LEVEL:-DEBUG} # Change INFO to DEBUG to capture more logs LOG_FILE: ${LOG_FILE:-} DEBUG: ${DEBUG:-true} # Enable debug mode FLASK_DEBUG: ${FLASK_DEBUG:-false} SECRET_KEY: ${SECRET_KEY:-sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U} INIT_PASSWORD: ${INIT_PASSWORD:-} CONSOLE_WEB_URL: ${CONSOLE_WEB_URL:-} CONSOLE_API_URL: ${CONSOLE_API_URL:-} SERVICE_API_URL: ${SERVICE_API_URL:-} APP_WEB_URL: ${APP_WEB_URL:-} CHECK_UPDATE_URL: ${CHECK_UPDATE_URL:-https://updates.dify.ai} OPENAI_API_BASE: ${OPENAI_API_BASE:-https://api.openai.com/v1} FILES_URL: ${FILES_URL:-} FILES_ACCESS_TIMEOUT: ${FILES_ACCESS_TIMEOUT:-300} APP_MAX_ACTIVE_REQUESTS: ${APP_MAX_ACTIVE_REQUESTS:-0} MIGRATION_ENABLED: ${MIGRATION_ENABLED:-true} DEPLOY_ENV: ${DEPLOY_ENV:-PRODUCTION} DIFY_BIND_ADDRESS: ${DIFY_BIND_ADDRESS:-0.0.0.0} DIFY_PORT: ${DIFY_PORT:-5001} SERVER_WORKER_AMOUNT: ${SERVER_WORKER_AMOUNT:-} SERVER_WORKER_CLASS: ${SERVER_WORKER_CLASS:-} CELERY_WORKER_CLASS: ${CELERY_WORKER_CLASS:-} GUNICORN_TIMEOUT: ${GUNICORN_TIMEOUT:-360} CELERY_WORKER_AMOUNT: ${CELERY_WORKER_AMOUNT:-1} # Ensure this is set appropriately CELERY_AUTO_SCALE: ${CELERY_AUTO_SCALE:-true} # Enable auto-scaling if needed CELERY_MAX_WORKERS: ${CELERY_MAX_WORKERS:-10} # Set max workers CELERY_MIN_WORKERS: ${CELERY_MIN_WORKERS:-1} # Set min workers API_TOOL_DEFAULT_CONNECT_TIMEOUT: ${API_TOOL_DEFAULT_CONNECT_TIMEOUT:-10} API_TOOL_DEFAULT_READ_TIMEOUT: ${API_TOOL_DEFAULT_READ_TIMEOUT:-60} DB_USERNAME: ${DB_USERNAME:-postgres} DB_PASSWORD: ${DB_PASSWORD:-difyai123456} DB_HOST: ${DB_HOST:-db} DB_PORT: ${DB_PORT:-5432} DB_DATABASE: ${DB_DATABASE:-dify} SQLALCHEMY_POOL_SIZE: ${SQLALCHEMY_POOL_SIZE:-30} SQLALCHEMY_POOL_RECYCLE: ${SQLALCHEMY_POOL_RECYCLE:-3600} SQLALCHEMY_ECHO: ${SQLALCHEMY_ECHO:-false} REDIS_HOST: ${REDIS_HOST:-redis} REDIS_PORT: ${REDIS_PORT:-6379} REDIS_USERNAME: ${REDIS_USERNAME:-} REDIS_PASSWORD: ${REDIS_PASSWORD:-difyai123456} REDIS_USE_SSL: ${REDIS_USE_SSL:-false} REDIS_DB: 0 CELERY_BROKER_URL: ${CELERY_BROKER_URL:-redis://:difyai123456@redis:6379/1}
Additionally, ensure that the
entrypoint.sh
script is correctly setting the Celery worker options. Here is the relevant part of theentrypoint.sh
script:#!/bin/bash set -e if [[ "${MIGRATION_ENABLED}" == "true" ]]; then echo "Running migrations" flask upgrade-db fi if [[ "${MODE}" == "worker" ]]; then # Get the number of available CPU cores if [ "${CELERY_AUTO_SCALE,,}" = "true" ]; then # Set MAX_WORKERS to the number of available cores if not specified AVAILABLE_CORES=$(nproc) MAX_WORKERS=${CELERY_MAX_WORKERS:-$AVAILABLE_CORES} MIN_WORKERS=${CELERY_MIN_WORKERS:-1} CONCURRENCY_OPTION="--autoscale=${MAX_WORKERS},${MIN_WORKERS}" else CONCURRENCY_OPTION="-c ${CELERY_WORKER_AMOUNT:-1}" fi exec celery -A app.celery worker -P ${CELERY_WORKER_CLASS:-gevent} $CONCURRENCY_OPTION --loglevel INFO \ -Q ${CELERY_QUEUES:-dataset,generation,mail,ops_trace,app_deletion} elif [[ "${MODE}" == "beat" ]]; then exec celery -A app.celery beat --loglevel INFO else if [[ "${DEBUG}" == "true" ]]; then exec flask run --host=${DIFY_BIND_ADDRESS:-0.0.0.0} --port=${DIFY_PORT:-5001} --debug else exec gunicorn \ --bind "${DIFY_BIND_ADDRESS:-0.0.0.0}:${DIFY_PORT:-5001}" \ --workers ${SERVER_WORKER_AMOUNT:-1} \ --worker-class ${SERVER_WORKER_CLASS:-gevent} \ --timeout ${GUNICORN_TIMEOUT:-200} \ --preload \ app:app fi fi
By setting
LOG_LEVEL
toDEBUG
and ensuring that the Celery worker settings are properly configured, you should be able to get more detailed logs and potentially resolve the issue with the dataset status being stuck on "Queuing". Additionally, make sure that theCELERY_WORKER_AMOUNT
is set appropriately, as the default is 1, and adjust other related settings as needed [1][2].To continue talking to Dosu, mention @dosu.
Use DEBUG model, my log stop here:
[2024-08-23 11:37:33,038: INFO/MainProcess] Task tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task[d292c9c5-1a9a-4eaf-be96-f79ecf28aaeb] received
[2024-08-23 11:37:33,039: DEBUG/MainProcess] TaskPool: Apply <function fast_trace_task at 0x7f50f3adbf40> (args:('tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task', 'd292c9c5-1a9a-4eaf-be96-f79ecf28aaeb', {'lang': 'py', 'task': 'tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task', 'id': 'd292c9c5-1a9a-4eaf-be96-f79ecf28aaeb', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'd292c9c5-1a9a-4eaf-be96-f79ecf28aaeb', 'parent_id': None, 'argsrepr': "('43edd1f8-df98-4ec6-9960-b5680537e63f', [{'content': '问题 1', 'answer': '答案 1'}, {'content': '问题 2', 'answer': '答案 2'}], '6779c07c-9deb-4200-89e0-3f7a2b1a0519', 'a1a7c357-acfb-484d-a365-30262bafb03f', 'a7dc268e-6e62-4af9-9b2f-f32458108895', 'af32597e-a4f0-45fe-9f96-b924dd8ab1e5')", 'kwargsrepr': '{}', 'origin': 'gen164@46db059089ed', 'ignore_result': True, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}, 'properties': {'correlation_id': 'd292c9c5-1a9a-4eaf-be96-f79ecf28aaeb', 'reply_to':... kwargs:{})
I have try to use prefork instead of gevent,it's ok. my cli is:
celery -A app.celery worker -P prefork --concurrency 1 --loglevel DEBUG -Q dataset,generation,mail,ops_trace,app_deletion --without-gossip --without-mingle
when use gevent
, the worker stopped at:
[2024-08-23 12:13:03,540: DEBUG/MainProcess] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2024-08-23 12:13:03,540: DEBUG/MainProcess] Loading model from cache /tmp/jieba.cache
Loading model cost 1.036 seconds.
[2024-08-23 12:13:04,576: DEBUG/MainProcess] Loading model cost 1.036 seconds.
Prefix dict has been built successfully.
[2024-08-23 12:13:04,576: DEBUG/MainProcess] Prefix dict has been built successfully.
the right process is:
Loading model cost 1.144 seconds.
[2024-08-23 12:11:01,774: DEBUG/ForkPoolWorker-1] Loading model cost 1.144 seconds.
Prefix dict has been built successfully.
[2024-08-23 12:11:01,775: DEBUG/ForkPoolWorker-1] Prefix dict has been built successfully.
[2024-08-23 12:11:01,843: DEBUG/ForkPoolWorker-1] Created new connection using: 160e3511b3bc4b78b107b362d744816f
[2024-08-23 12:11:01,888: INFO/ForkPoolWorker-1] Processed dataset: c8575896-61e0-4222-add0-99b35db85f56 latency: 2.9650166537612677
[2024-08-23 12:11:01,888: INFO/ForkPoolWorker-1] Task tasks.document_indexing_task.document_indexing_task[3bedb474-26c0-4ba8-9674-c37e0086edfb] succeeded in 2.9661346543580294s: None
I have another PR need merge. so I think the issue is:
# /api/extensions/ext_celery.py
use monkey patch to fix it.
from gevent import monkey
monkey.patch_all()
Self Checks
Dify version
0.7.0
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Here is the worker log:
no more logs print out. and the dataset still Queuing.
✔️ Expected Behavior
more logs and debug info.
❌ Actual Behavior
no more logs, dataset status: Queuing