langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
44.37k stars 6.21k forks source link

The created knowledge base is not visible #6275

Open leeyis opened 1 month ago

leeyis commented 1 month ago

Self Checks

Dify version

0.6.13

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

✔️ Expected Behavior

A knowledge base with over ten million words is still visible and accessible as normal.

❌ Actual Behavior

A knowledge base containing an excessively large number of characters is not visible.

leeyis commented 1 month ago

1.The knowledge base has been successfully created but it is not visible. image 2.When creating an application, the invisible knowledge base can be selected. image

dosubot[bot] commented 1 month ago

@leeyis 你好! 我是Dosu,一个可以帮助你解决问题,回答疑问,并成为贡献者的机器人。在等待人类维护者的过程中,我会尽力帮助你解决问题。

To resolve the issue of the knowledge base not being visible on the knowledge base page in Dify version 0.6.13, which is self-hosted using Docker, follow these steps:

  1. Ensure All Containers Are Running: Verify that all necessary containers are up and running by executing:

    docker compose ps

    Ensure that the api, worker, web, weaviate, db, redis, and nginx services are all listed and in the Up status.

  2. Check Environment Variables: Ensure that the environment variables in your .env file are correctly configured. You can find the example environment variables in dify/docker/.env.example. If you made any changes, restart the services:

    docker compose down
    docker compose up -d
  3. Update Dify: Make sure you are using the latest version of Dify. Navigate to the docker directory of the dify source code and execute the following commands to pull the latest updates and restart the services:

    cd dify/docker
    git pull origin main
    docker compose down
    docker compose pull
    docker compose up -d
  4. Access Dify: Finally, access the Dify installation page to ensure everything is set up correctly:

    http://localhost/install

By following these steps, you should be able to resolve the issue of the knowledge base not being visible on the knowledge base page [1].

Additionally, there are known limitations with handling large knowledge bases in Dify version 0.6.13. Specifically, the maximum size for a single document upload is 15MB, and there is a limit of 100 total documents. The maximum number of files for a single batch upload is 20. These limits can be adjusted if you are using a local deployment [2][3].

To continue talking to Dosu, mention @dosu.

crazywoola commented 1 month ago

Can you provide the logs of the api and worker? I can not reproduce this problem.

leeyis commented 1 month ago

Can you provide the logs of the api and worker? I can not reproduce this problem.

Of course, when entering the KNOWLEDGE page, the api container logs are as follows:

api-1  | 2024-07-16 08:33:44,882.882 INFO [Thread-243 (_generate_worker)] [_client.py:1026] - HTTP Request: POST http://192.168.10.253:9997/v1/chat/completions "HTTP/1.1 200 OK"
api-1  | 2024-07-16 08:35:36,920.920 INFO [Dummy-246] [_client.py:1026] - HTTP Request: GET http://sandbox:8194/v1/sandbox/dependencies?language=python3 "HTTP/1.1 200 OK"
api-1  | 2024-07-16 08:40:23,868.868 INFO [Thread-247 (_generate_worker)] [_client.py:1026] - HTTP Request: POST http://192.168.10.253:9997/v1/chat/completions "HTTP/1.1 200 OK"
api-1  | 2024-07-16 08:40:43,894.894 INFO [Dummy-250] [_client.py:1026] - HTTP Request: POST http://192.168.10.253:9997/v1/chat/completions "HTTP/1.1 200 OK"

the worker container logs are as follows:

worker-1  | Running migrations
worker-1  | None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
worker-1  | Preparing database migration...
worker-1  | Database migration skipped
worker-1  | None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
worker-1  | /app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
worker-1  | absolutely not recommended!
worker-1  |
worker-1  | Please specify a different user using the --uid option.
worker-1  |
worker-1  | User information: uid=0 euid=0 gid=0 egid=0
worker-1  |
worker-1  |   warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(
worker-1  |
worker-1  |  -------------- celery@6ca16c92b9ea v5.3.6 (emerald-rush)
worker-1  | --- ***** -----
worker-1  | -- ******* ---- Linux-5.10.120-x86_64-with-glibc2.36 2024-07-14 07:54:15
worker-1  | - *** --- * ---
worker-1  | - ** ---------- [config]
worker-1  | - ** ---------- .> app:         app:0x7f814798c970
worker-1  | - ** ---------- .> transport:   redis://:**@redis:6379/1
worker-1  | - ** ---------- .> results:     postgresql://postgres:**@db:5432/dify
worker-1  | - *** --- * --- .> concurrency: 1 (gevent)
worker-1  | -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
worker-1  | --- ***** -----
worker-1  |  -------------- [queues]
worker-1  |                 .> app_deletion     exchange=app_deletion(direct) key=app_deletion
worker-1  |                 .> dataset          exchange=dataset(direct) key=dataset
worker-1  |                 .> generation       exchange=generation(direct) key=generation
worker-1  |                 .> mail             exchange=mail(direct) key=mail
worker-1  |                 .> ops_trace        exchange=ops_trace(direct) key=ops_trace
worker-1  |
worker-1  | [tasks]
worker-1  |   . schedule.clean_embedding_cache_task.clean_embedding_cache_task
worker-1  |   . schedule.clean_unused_datasets_task.clean_unused_datasets_task
worker-1  |   . tasks.add_document_to_index_task.add_document_to_index_task
worker-1  |   . tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
worker-1  |   . tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
worker-1  |   . tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
worker-1  |   . tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
worker-1  |   . tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
worker-1  |   . tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
worker-1  |   . tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
worker-1  |   . tasks.clean_dataset_task.clean_dataset_task
worker-1  |   . tasks.clean_document_task.clean_document_task
worker-1  |   . tasks.clean_notion_document_task.clean_notion_document_task
worker-1  |   . tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
worker-1  |   . tasks.delete_segment_from_index_task.delete_segment_from_index_task
worker-1  |   . tasks.disable_segment_from_index_task.disable_segment_from_index_task
worker-1  |   . tasks.document_indexing_sync_task.document_indexing_sync_task
worker-1  |   . tasks.document_indexing_task.document_indexing_task
worker-1  |   . tasks.document_indexing_update_task.document_indexing_update_task
worker-1  |   . tasks.duplicate_document_indexing_task.duplicate_document_indexing_task
worker-1  |   . tasks.enable_segment_to_index_task.enable_segment_to_index_task
worker-1  |   . tasks.mail_invite_member_task.send_invite_member_mail_task
worker-1  |   . tasks.mail_reset_password_task.send_reset_password_mail_task
worker-1  |   . tasks.ops_trace_task.process_trace_tasks
worker-1  |   . tasks.recover_document_indexing_task.recover_document_indexing_task
worker-1  |   . tasks.remove_app_and_related_data_task.remove_app_and_related_data_task
worker-1  |   . tasks.remove_document_from_index_task.remove_document_from_index_task
worker-1  |   . tasks.retry_document_indexing_task.retry_document_indexing_task
worker-1  |   . tasks.sync_website_document_indexing_task.sync_website_document_indexing_task
worker-1  |
worker-1  | [2024-07-14 07:54:16,102: INFO/MainProcess] Connected to redis://:**@redis:6379/1
worker-1  | [2024-07-14 07:54:16,108: INFO/MainProcess] mingle: searching for neighbors
worker-1  | [2024-07-14 07:54:17,122: INFO/MainProcess] mingle: all alone
worker-1  | [2024-07-14 07:54:17,140: INFO/MainProcess] pidbox: Connected to redis://:**@redis:6379/1.
worker-1  | [2024-07-14 07:54:17,147: INFO/MainProcess] celery@6ca16c92b9ea ready.

It seems there's nothing wrong.

leeyis commented 1 month ago

Here is a screenshot of the current issue, and additionally, an important piece of information: my vector database is pgvector. image

crazywoola commented 2 weeks ago

Have you tried the latest version?