langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.87k stars 6.47k forks source link

Knowledge base problem: Uploaded files are queued #8525

Closed create-my closed 1 day ago

create-my commented 1 day ago

Self Checks

Dify version

0.8.0

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

DES: I discovered this problem when I upgraded from version 0.6.8 to 0.8.3. Maybe my upgrade method is not correct. I downloaded the zip and configured the environment according to the steps. Later, I removed the docker container, recreated the python environment, and reconfigured it. The problem remains. V0.8.0 I suspect it is docker problem and I am planning to uninstall docker desktop and reinstall it

FIG: image

DOCKER LOGS: 2024-09-18 08:24:11 sandbox-1 | [GIN] 2024/09/18 - 08:24:11 | 200 | 15.622µs | 127.0.0.1 | GET "/health" 2024-09-18 08:24:41 sandbox-1 | [GIN] 2024/09/18 - 08:24:41 | 200 | 17.314µs | 127.0.0.1 | GET "/health" 2024-09-18 08:25:02 db-1 | 2024-09-18 08:25:02.842 UTC [178] LOG: checkpoint starting: time 2024-09-18 08:25:11 sandbox-1 | [GIN] 2024/09/18 - 08:25:11 | 200 | 17.241µs | 127.0.0.1 | GET "/health" 2024-09-18 08:25:12 db-1 | 2024-09-18 08:25:12.645 UTC [178] LOG: checkpoint complete: wrote 86 buffers (0.5%); 0 WAL file(s) added, 0 removed, 0 recycled; write=8.605 s, sync=1.084 s, total=9.804 s; sync files=38, longest=0.079 s, average=0.029 s; distance=143 kB, estimate=2417 kB 2024-09-18 08:25:41 sandbox-1 | [GIN] 2024/09/18 - 08:25:41 | 200 | 16.429µs | 127.0.0.1 | GET "/health"

POTRY LOGS: 2024-09-18 08:25:40,530.530 INFO [Thread-220 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "OPTIONS /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5/documents?page=1&limit=15&keyword=&fetch= HTTP/1.1" 200 - 2024-09-18 08:25:40,532.532 INFO [Thread-221 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "OPTIONS /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5/related-apps HTTP/1.1" 200 - 2024-09-18 08:25:40,535.535 INFO [Thread-219 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "OPTIONS /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5/error-docs HTTP/1.1" 200 - 2024-09-18 08:25:40,555.555 INFO [Thread-222 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "GET /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5 HTTP/1.1" 200 - 2024-09-18 08:25:40,805.805 INFO [Thread-223 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "GET /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5/documents?page=1&limit=15&keyword=&fetch= HTTP/1.1" 200 - 2024-09-18 08:25:40,863.863 INFO [Thread-224 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "GET /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5/related-apps HTTP/1.1" 200 - 2024-09-18 08:25:40,869.869 INFO [Thread-225 (process_request_thread)] [_internal.py:97] - 127.0.0.1 - - [18/Sep/2024 08:25:40] "GET /console/api/datasets/e11a918b-1fb7-4692-9d6e-93b44a5fd1d5/error-docs HTTP/1.1" 200 -

✔️ Expected Behavior

File implementation conversion

❌ Actual Behavior

File status error

dosubot[bot] commented 1 day ago

I found a similar issue where uploaded files were queued and not processed after upgrading from version 0.6.8 to 0.8.3 in a self-hosted Dify environment. The problem is likely due to changes in the environment variable configuration.

To resolve this issue, you need to sync the environment variable configuration by updating your local .env file to match any new variables or changes in the .env.example file. Here are the steps to follow:

  1. Navigate to the docker directory of the Dify source code:

    cd dify/docker
  2. Bring down the current Docker containers:

    docker compose down
  3. Pull the latest changes from the repository:

    git pull origin main
  4. Pull the latest Docker images:

    docker compose pull
  5. Start the Docker containers:

    docker compose up -d
  6. Sync the environment variable configuration:

    • Compare your local .env file with the updated .env.example file.
    • Add any new variables from .env.example to your .env file.
    • Update any values in your .env file that have changed in .env.example.

Additionally, ensure that the encryption keys are correctly set up, as missing or incorrect keys can cause issues with file processing. If you encounter a "File not found" error in the logs, it may be due to missing encryption keys. You can reset the encryption public and private keys with the following command:

By ensuring that your environment variables are correctly configured and encryption keys are properly set, you should resolve the issue of uploaded files being queued and not processed [1][2][3].

To continue talking to Dosu, mention @dosu.

create-my commented 1 day ago

I try use debug commond: poetry run python -m celery -A app.celery worker -P gevent -c 1 --loglevel INFO -Q dataset,generation,mail,ops_trace,app_deletion

LOGS: [2024-09-18 08:47:50,171: INFO/MainProcess] Connected to redis://:@localhost:6379/1 [2024-09-18 08:47:50,178: INFO/MainProcess] mingle: searching for neighbors [2024-09-18 08:47:51,217: INFO/MainProcess] mingle: all alone [2024-09-18 08:47:51,237: INFO/MainProcess] pidbox: Connected to redis://:@localhost:6379/1. [2024-09-18 08:47:51,241: INFO/MainProcess] celery@DESKTOP-YUAN ready. [2024-09-18 08:47:51,243: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[38dca332-e65d-43eb-9fc1-9664ce4cd355] received [2024-09-18 08:47:51,431: ERROR/MainProcess] Task tasks.document_indexing_task.document_indexing_task[38dca332-e65d-43eb-9fc1-9664ce4cd355] raised unexpected: AttributeError("'NoneType' object has no attribute 'tenant_id'") Traceback (most recent call last): File "E:\anaconda\anzhuang\envs\dify80\lib\site-packages\celery\app\trace.py", line 477, in trace_task R = retval = fun(*args, kwargs) File "D:\06大语言模型\dify80\dify-0.8.0\api\extensions\ext_celery.py", line 11, in call return self.run(*args, *kwargs) File "D:\06大语言模型\dify80\dify-0.8.0\api\tasks\document_indexing_task.py", line 30, in document_indexing_task features = FeatureService.get_features(dataset.tenant_id) AttributeError: 'NoneType' object has no attribute 'tenant_id' [2024-09-18 08:47:51,433: INFO/MainProcess] Task tasks.remove_app_and_related_data_task.remove_app_and_related_data_task[c2d7f074-174e-4682-b6be-85d582334229] received [2024-09-18 08:47:51,434: INFO/MainProcess] Start deleting app and related data: 83e33f11-8240-438e-b3b8-53dfff09223f:cc3dfd1d-6a82-4b0f-9f53-35d198664f01 [2024-09-18 08:47:51,479: INFO/MainProcess] Deleted app model config 7151b20c-4dbf-45fe-b182-eb999bbdf01b [2024-09-18 08:47:51,499: INFO/MainProcess] Deleted site b3cbfacb-b0f1-4210-8553-b8e20ed5df38 [2024-09-18 08:47:51,580: INFO/MainProcess] Deleted installed app 952e1e10-5ba9-47b8-ba90-ed1f7a4c1e4d [2024-09-18 08:47:51,681: INFO/MainProcess] Deleted conversation variables for app cc3dfd1d-6a82-4b0f-9f53-35d198664f01 [2024-09-18 08:47:51,681: INFO/MainProcess] App and related data deleted: cc3dfd1d-6a82-4b0f-9f53-35d198664f01 latency: 0.2474448000002667 [2024-09-18 08:47:51,698: INFO/MainProcess] Task tasks.remove_app_and_related_data_task.remove_app_and_related_data_task[c2d7f074-174e-4682-b6be-85d582334229] succeeded in 0.2660000000000764s: None [2024-09-18 08:47:51,700: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[2a96a0f6-7c4f-4896-a816-7917b263f402] received [2024-09-18 08:47:51,720: ERROR/MainProcess] Task tasks.document_indexing_task.document_indexing_task[2a96a0f6-7c4f-4896-a816-7917b263f402] raised unexpected: AttributeError("'NoneType' object has no attribute 'tenant_id'") Traceback (most recent call last): File "E:\anaconda\anzhuang\envs\dify80\lib\site-packages\celery\app\trace.py", line 477, in trace_task R = retval = fun(args, kwargs) File "D:\06大语言模型\dify80\dify-0.8.0\api\extensions\ext_celery.py", line 11, in call return self.run(*args, **kwargs) File "D:\06大语言模型\dify80\dify-0.8.0\api\tasks\document_indexing_task.py", line 30, in document_indexing_task features = FeatureService.get_features(dataset.tenant_id) AttributeError: 'NoneType' object has no attribute 'tenant_id' [2024-09-18 08:47:51,721: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[17a001ca-6cec-4e26-a3d2-bbda2fb5d1ed] received [2024-09-18 08:47:51,722: INFO/MainProcess] Start clean dataset when dataset deleted: e11a918b-1fb7-4692-9d6e-93b44a5fd1d5 [2024-09-18 08:47:51,731: INFO/MainProcess] Cleaning documents for dataset: e11a918b-1fb7-4692-9d6e-93b44a5fd1d5 [2024-09-18 08:47:51,734: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[7cd8124c-59c5-4d5f-a116-3a031ef2a95d] received [2024-09-18 08:47:51,779: INFO/MainProcess] Cleaned dataset when dataset deleted: e11a918b-1fb7-4692-9d6e-93b44a5fd1d5 latency: 0.05673220000016954 [2024-09-18 08:47:51,796: INFO/MainProcess] Task tasks.clean_dataset_task.clean_dataset_task[17a001ca-6cec-4e26-a3d2-bbda2fb5d1ed] succeeded in 0.07799999999997453s: None [2024-09-18 08:47:51,798: INFO/MainProcess] Start process document: 0ace9cc2-6af8-430a-82cb-a24e15fb4d45 Building prefix dict from the default dictionary ... [2024-09-18 08:47:52,633: DEBUG/MainProcess] Building prefix dict from the default dictionary ... Dumping model to file cache C:\Users\user\AppData\Local\Temp\jieba.cache [2024-09-18 08:47:53,017: DEBUG/MainProcess] Dumping model to file cache C:\Users\user\AppData\Local\Temp\jieba.cache Loading model cost 0.427 seconds. [2024-09-18 08:47:53,060: DEBUG/MainProcess] Loading model cost 0.427 seconds. Prefix dict has been built successfully. [2024-09-18 08:47:53,060: DEBUG/MainProcess] Prefix dict has been built successfully. [2024-09-18 08:47:53,533: INFO/MainProcess] Processed dataset: 922ebbec-2d04-4ab3-8795-9e6fb09e5ee9 latency: 1.7366667999999663 [2024-09-18 08:47:53,559: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[7cd8124c-59c5-4d5f-a116-3a031ef2a95d] succeeded in 1.7660000000000764s: None

create-my commented 1 day ago

Looks like the background cleaned up the document processing task 看起来像是将刚才的文档处理任务清理了 image

create-my commented 1 day ago

I restarted and found that the document processing was complete. 我重新启动,发现这个文档处理完成了。poetry run python -m flask run --host 0.0.0.0 --port=5001 --debug image