Open qkzsky opened 3 weeks ago
I found a similar discussion that might be relevant to your issue. In a previous discussion, a user faced a similar problem where deleting logs and related data was not sufficient, and they had to manually truncate several tables to resolve the issue. The tables they truncated were:
truncate public.workflow_runs;
truncate public.workflow_node_executions;
truncate public.conversations;
truncate public.messages;
truncate public.end_users;
truncate public.workflow_app_logs;
This approach successfully resolved their issue [1].
For your specific problem with the remove_app_and_related_data_task
not executing, ensure that:
Celery is properly configured and running: Since this task is a Celery task, make sure your Celery worker is running and connected to the correct queue (app_deletion
).
Check Celery logs: Look for any errors or issues in the Celery logs that might indicate why the task is not being executed.
Verify task scheduling: Ensure that the task is being scheduled correctly. You might want to manually trigger the task to see if it executes.
Database connection: Confirm that the database connection is active and that there are no permission issues preventing data deletion.
If these steps do not resolve the issue, you might need to manually clear the data as described in the similar discussion.
To continue talking to Dosu, mention @dosu.
Not sure how you delete those things.
poetry run python -m celery -A app.celery worker -P gevent -c 1 --loglevel INFO -Q dataset,generation,mail,ops_trace,app_deletion
Those should be deleted by the celery worker, please check the worker logs.
The container is started by the default run script, the Mode environment variable is the api, is that why the asynchronous deletion task is not performed?
Yes, but I saw you labeled this as Self Hosted (Docker)
. So this should be started as default.
The process startup parameter I see from /prco/1/cmdline is / app/API /. Venv/bin/python/app/API /. Venv/bin/gunicorn -- bind 0.0.0.0:5001 --workers 1 --worker-class gevent --timeout 200 --preload app:app
When deleting the application, I observed the data in db and docker logs, but did not find any output information of the corresponding code Only the app table data is deleted, and remove_app_and_related_data_task is not executed
So, is this caused by celery worker not being activated in gunicorn mode? How do I adjust startup items or environment variables if I want it to work?
If you run the application in docker mode by docker compose up -d
, this should not happen.
Only if you run it from the source code. Then you need to make sure the redis is up and running and run this command.
https://github.com/langgenius/dify/blob/a8134a49c4f9a3c4f825d1b79d622a7f4807aade/api/README.md#L71
Yes, Redis is up and running The current environment is a service that runs directly through docker. Currently, only the MODE=api service is started, but the MODE=worker service is not started When I tried to start the Mode=worker service, I found that the service did not run permanently, and it was directly shut down after startup, nor did I get the asynchronous deletion task and execute it This is the startup log
/app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(
-------------- celery@dify-unpebwkg-worker-0 v5.3.6 (emerald-rush)
--- ***** -----
-- ******* ---- Linux-5.15.0-112-generic-x86_64-with-glibc2.40 2024-11-05 15:22:49
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: app_factory:0x7f92d58b8790
- ** ---------- .> transport: redis://default:**@dify-unpebwkg-redis-redis.ns-mtsj6ogy.svc:6379/1
- ** ---------- .> results: postgresql://postgres:**@dify-unpebwkg-pg-postgresql:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> app_deletion exchange=app_deletion(direct) key=app_deletion
.> dataset exchange=dataset(direct) key=dataset
.> generation exchange=generation(direct) key=generation
.> mail exchange=mail(direct) key=mail
.> ops_trace exchange=ops_trace(direct) key=ops_trace
[tasks]
. schedule.clean_embedding_cache_task.clean_embedding_cache_task
. schedule.clean_unused_datasets_task.clean_unused_datasets_task
. schedule.create_tidb_serverless_task.create_tidb_serverless_task
. schedule.update_tidb_serverless_status_task.update_tidb_serverless_status_task
. tasks.add_document_to_index_task.add_document_to_index_task
. tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
. tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
. tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
. tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
. tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
. tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
. tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
. tasks.clean_dataset_task.clean_dataset_task
. tasks.clean_document_task.clean_document_task
. tasks.clean_notion_document_task.clean_notion_document_task
. tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
. tasks.delete_segment_from_index_task.delete_segment_from_index_task
. tasks.disable_segment_from_index_task.disable_segment_from_index_task
. tasks.document_indexing_sync_task.document_indexing_sync_task
. tasks.document_indexing_task.document_indexing_task
. tasks.document_indexing_update_task.document_indexing_update_task
. tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
. tasks.enable_segment_to_index_task.enable_segment_to_index_task
. tasks.mail_email_code_login.send_email_code_login_mail_task
. tasks.mail_invite_member_task.send_invite_member_mail_task
. tasks.mail_reset_password_task.send_reset_password_mail_task
. tasks.ops_trace_task.process_trace_tasks
. tasks.recover_document_indexing_task.recover_document_indexing_task
. tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
. tasks.remove_document_from_index_task.remove_document_from_index_task
. tasks.retry_document_indexing_task.retry_document_indexing_task
. tasks.sync_website_document_indexing_task.sync_website_document_indexing_task
[2024-11-05 15:22:49,659: INFO/MainProcess] Connected to redis://default:**@dify-unpebwkg-redis-redis.ns-mtsj6ogy.svc:6379/1
[2024-11-05 15:22:49,664: INFO/MainProcess] mingle: searching for neighbors
[2024-11-05 15:22:50,688: INFO/MainProcess] mingle: sync with 2 nodes
[2024-11-05 15:22:50,688: INFO/MainProcess] mingle: sync complete
[2024-11-05 15:22:50,706: INFO/MainProcess] pidbox: Connected to redis://default:**@dify-unpebwkg-redis-redis.ns-mtsj6ogy.svc:6379/1.
[2024-11-05 15:22:50,717: INFO/MainProcess] celery@dify-unpebwkg-worker-0 ready.
worker: Warm shutdown (MainProcess)
This is weird, do you have any other information could help us to resolve this issue. Eg. system info and cup arch?
OK
docker image
langgenius/dify-api:0.10.2
CPU
Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Environment variable
MODE=worker LOG_LEVEL=INFO SECRET_KEY=sk-xxxxxx CONSOLE_WEB_URL=https://xxx.com INIT_PASSWORD=xxxxxxxxx CONSOLE_API_URL=https://xxx.com SERVICE_API_URL=https://xxx.com APP_WEB_URL=https://xxx.com FILES_URL= MIGRATION_ENABLED=true DB_DATABASE=dify REDIS_USE_SSL=false REDIS_DB=0 CELERY_BROKER_URL=redis://$(REDIS_USERNAME):$(REDIS_PASSWORD)@$(REDIS_HOST).ns-mtsj6ogy.svc:$(REDIS_PORT)/1 WEB_API_CORS_ALLOW_ORIGINS= CONSOLE_CORS_ALLOW_ORIGINS= STORAGE_TYPE=* STORAGE_LOCAL_PATH=/app/api/storage VECTOR_STORE=weaviate WEAVIATE_ENDPOINT=http://$(WEAVIATE_HOST):$(WEAVIATE_PORT) WEAVIATE_API_KEY= CODE_EXECUTION_ENDPOINT=http://dify-unpebwkg-sandbox.ns-mtsj6ogy.svc:8194 CODE_EXECUTION_API_KEY=dify-sandbox CODE_MAX_NUMBER=9223372036854775807 CODE_MIN_NUMBER=-9223372036854775808 CODE_MAX_STRING_LENGTH=800000 TEMPLATE_TRANSFORM_MAX_LENGTH=800000 CODE_MAX_STRING_ARRAY_LENGTH=30 CODE_MAX_OBJECT_ARRAY_LENGTH=30 CODE_MAX_NUMBER_ARRAY_LENGTH=1000 TZ=Asia/Shanghai
In addition, I also observe that Redis does not have a key with the word app_deletion
See if any additional information is needed to locate the location
@JohnJyong Could you take a look at this.
Self Checks
Dify version
0.10.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Background: I have some application logs which use a large amount of disk space and need to clear the logs. However, I found no way to clear the logs in application management, so I adopted the method of deleting the original application after copying the application. After deleting the application, I found that the application was deleted, but the relevant data was not deleted. The trace found that the remove_app_and_related_data_task method was not executed, and all the deletion methods involved in it were executed
✔️ Expected Behavior
remove_app_and_related_data_task runs normally and clears information about the deleted application
❌ Actual Behavior
Nothing happened