langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
50.83k stars 7.31k forks source link

remove_app_and_related_data_task not work #10266

Open qkzsky opened 1 day ago

qkzsky commented 1 day ago

Self Checks

Dify version

0.10.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Background: I have some application logs which use a large amount of disk space and need to clear the logs. However, I found no way to clear the logs in application management, so I adopted the method of deleting the original application after copying the application. After deleting the application, I found that the application was deleted, but the relevant data was not deleted. The trace found that the remove_app_and_related_data_task method was not executed, and all the deletion methods involved in it were executed

✔️ Expected Behavior

remove_app_and_related_data_task runs normally and clears information about the deleted application

❌ Actual Behavior

Nothing happened

dosubot[bot] commented 1 day ago

I found a similar discussion that might be relevant to your issue. In a previous discussion, a user faced a similar problem where deleting logs and related data was not sufficient, and they had to manually truncate several tables to resolve the issue. The tables they truncated were:

truncate public.workflow_runs;
truncate public.workflow_node_executions;
truncate public.conversations;
truncate public.messages;
truncate public.end_users;
truncate public.workflow_app_logs;

This approach successfully resolved their issue [1].

For your specific problem with the remove_app_and_related_data_task not executing, ensure that:

  1. Celery is properly configured and running: Since this task is a Celery task, make sure your Celery worker is running and connected to the correct queue (app_deletion).

  2. Check Celery logs: Look for any errors or issues in the Celery logs that might indicate why the task is not being executed.

  3. Verify task scheduling: Ensure that the task is being scheduled correctly. You might want to manually trigger the task to see if it executes.

  4. Database connection: Confirm that the database connection is active and that there are no permission issues preventing data deletion.

If these steps do not resolve the issue, you might need to manually clear the data as described in the similar discussion.

To continue talking to Dosu, mention @dosu.

crazywoola commented 1 day ago

Not sure how you delete those things.

poetry run python -m celery -A app.celery worker -P gevent -c 1 --loglevel INFO -Q dataset,generation,mail,ops_trace,app_deletion

Those should be deleted by the celery worker, please check the worker logs.

qkzsky commented 1 day ago
image

The container is started by the default run script, the Mode environment variable is the api, is that why the asynchronous deletion task is not performed?

crazywoola commented 1 day ago

Yes, but I saw you labeled this as Self Hosted (Docker). So this should be started as default.

qkzsky commented 1 day ago

The process startup parameter I see from /prco/1/cmdline is / app/API /. Venv/bin/python/app/API /. Venv/bin/gunicorn -- bind 0.0.0.0:5001 --workers 1 --worker-class gevent --timeout 200 --preload app:app

When deleting the application, I observed the data in db and docker logs, but did not find any output information of the corresponding code Only the app table data is deleted, and remove_app_and_related_data_task is not executed

image
qkzsky commented 1 day ago

So, is this caused by celery worker not being activated in gunicorn mode? How do I adjust startup items or environment variables if I want it to work?

crazywoola commented 1 day ago

If you run the application in docker mode by docker compose up -d, this should not happen. Only if you run it from the source code. Then you need to make sure the redis is up and running and run this command.

https://github.com/langgenius/dify/blob/a8134a49c4f9a3c4f825d1b79d622a7f4807aade/api/README.md#L71

qkzsky commented 1 day ago

Yes, Redis is up and running The current environment is a service that runs directly through docker. Currently, only the MODE=api service is started, but the MODE=worker service is not started When I tried to start the Mode=worker service, I found that the service did not run permanently, and it was directly shut down after startup, nor did I get the asynchronous deletion task and execute it This is the startup log

/app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(

-------------- celery@dify-unpebwkg-worker-0 v5.3.6 (emerald-rush)
--- ***** -----
-- ******* ---- Linux-5.15.0-112-generic-x86_64-with-glibc2.40 2024-11-05 15:22:49
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         app_factory:0x7f92d58b8790
- ** ---------- .> transport:   redis://default:**@dify-unpebwkg-redis-redis.ns-mtsj6ogy.svc:6379/1
- ** ---------- .> results:     postgresql://postgres:**@dify-unpebwkg-pg-postgresql:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> app_deletion     exchange=app_deletion(direct) key=app_deletion
.> dataset          exchange=dataset(direct) key=dataset
.> generation       exchange=generation(direct) key=generation
.> mail             exchange=mail(direct) key=mail
.> ops_trace        exchange=ops_trace(direct) key=ops_trace

[tasks]
. schedule.clean_embedding_cache_task.clean_embedding_cache_task
. schedule.clean_unused_datasets_task.clean_unused_datasets_task
. schedule.create_tidb_serverless_task.create_tidb_serverless_task
. schedule.update_tidb_serverless_status_task.update_tidb_serverless_status_task
. tasks.add_document_to_index_task.add_document_to_index_task
. tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
. tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
. tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
. tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
. tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
. tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
. tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
. tasks.clean_dataset_task.clean_dataset_task
. tasks.clean_document_task.clean_document_task
. tasks.clean_notion_document_task.clean_notion_document_task
. tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
. tasks.delete_segment_from_index_task.delete_segment_from_index_task
. tasks.disable_segment_from_index_task.disable_segment_from_index_task
. tasks.document_indexing_sync_task.document_indexing_sync_task
. tasks.document_indexing_task.document_indexing_task
. tasks.document_indexing_update_task.document_indexing_update_task
. tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
. tasks.enable_segment_to_index_task.enable_segment_to_index_task
. tasks.mail_email_code_login.send_email_code_login_mail_task
. tasks.mail_invite_member_task.send_invite_member_mail_task
. tasks.mail_reset_password_task.send_reset_password_mail_task
. tasks.ops_trace_task.process_trace_tasks
. tasks.recover_document_indexing_task.recover_document_indexing_task
. tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
. tasks.remove_document_from_index_task.remove_document_from_index_task
. tasks.retry_document_indexing_task.retry_document_indexing_task
. tasks.sync_website_document_indexing_task.sync_website_document_indexing_task

[2024-11-05 15:22:49,659: INFO/MainProcess] Connected to redis://default:**@dify-unpebwkg-redis-redis.ns-mtsj6ogy.svc:6379/1
[2024-11-05 15:22:49,664: INFO/MainProcess] mingle: searching for neighbors
[2024-11-05 15:22:50,688: INFO/MainProcess] mingle: sync with 2 nodes
[2024-11-05 15:22:50,688: INFO/MainProcess] mingle: sync complete
[2024-11-05 15:22:50,706: INFO/MainProcess] pidbox: Connected to redis://default:**@dify-unpebwkg-redis-redis.ns-mtsj6ogy.svc:6379/1.
[2024-11-05 15:22:50,717: INFO/MainProcess] celery@dify-unpebwkg-worker-0 ready.

worker: Warm shutdown (MainProcess)
crazywoola commented 20 hours ago

This is weird, do you have any other information could help us to resolve this issue. Eg. system info and cup arch?

qkzsky commented 18 hours ago

OK

docker image

langgenius/dify-api:0.10.2

CPU

Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

Environment variable

MODE=worker LOG_LEVEL=INFO SECRET_KEY=sk-xxxxxx CONSOLE_WEB_URL=https://xxx.com INIT_PASSWORD=xxxxxxxxx CONSOLE_API_URL=https://xxx.com SERVICE_API_URL=https://xxx.com APP_WEB_URL=https://xxx.com FILES_URL= MIGRATION_ENABLED=true DB_DATABASE=dify REDIS_USE_SSL=false REDIS_DB=0 CELERY_BROKER_URL=redis://$(REDIS_USERNAME):$(REDIS_PASSWORD)@$(REDIS_HOST).ns-mtsj6ogy.svc:$(REDIS_PORT)/1 WEB_API_CORS_ALLOW_ORIGINS= CONSOLE_CORS_ALLOW_ORIGINS= STORAGE_TYPE=* STORAGE_LOCAL_PATH=/app/api/storage VECTOR_STORE=weaviate WEAVIATE_ENDPOINT=http://$(WEAVIATE_HOST):$(WEAVIATE_PORT) WEAVIATE_API_KEY= CODE_EXECUTION_ENDPOINT=http://dify-unpebwkg-sandbox.ns-mtsj6ogy.svc:8194 CODE_EXECUTION_API_KEY=dify-sandbox CODE_MAX_NUMBER=9223372036854775807 CODE_MIN_NUMBER=-9223372036854775808 CODE_MAX_STRING_LENGTH=800000 TEMPLATE_TRANSFORM_MAX_LENGTH=800000 CODE_MAX_STRING_ARRAY_LENGTH=30 CODE_MAX_OBJECT_ARRAY_LENGTH=30 CODE_MAX_NUMBER_ARRAY_LENGTH=1000 TZ=Asia/Shanghai

In addition, I also observe that Redis does not have a key with the word app_deletion

image

See if any additional information is needed to locate the location