Full-Text Search problem

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

https://dify.ai

Other

45.3k stars 6.36k forks source link

Full-Text Search problem #6508

Closed gz-simon closed 1 month ago

gz-simon commented 1 month ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.14

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When I deployed diff on docker, embedded the corresponding document, and then used the recall test, I found that the full-text search was empty. The same operation worked normally in the cloud.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

The Full-Text Search feature in Dify version 0.6.14 may be returning empty results when self-hosted using Docker due to a potential issue with the Weaviate container. The deployment logs show that the Weaviate container is running, but there might be a configuration or connectivity issue specific to the self-hosted environment.

To troubleshoot, ensure that the Weaviate container is properly configured and accessible. You can check the logs of the Weaviate container for any errors:

docker logs docker-weaviate-1

Additionally, verify that the Weaviate service is correctly integrated with the Dify application and that it has access to the necessary data for indexing and searching ^[1].

To continue talking to Dosu, mention @dosu.

gz-simon commented 1 month ago

@dosu PS E:\dify2\dify\docker> docker logs docker-weaviate-1 {"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-07-19T10:12:41Z"} {"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-07-19T10:12:41Z"} {"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-07-19T10:12:42Z"} {"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-07-19T10:12:42Z"} {"action":"hnsw_vector_cache_prefill","count":25000,"index_id":"vector_index_64401738_1db4_4ee0_8aac_9f156568f022_node_QgLe6uhrio1J","level":"info","limit":1000000000000,"msg":"prefilled vector cache","t ime":"2024-07-19T10:18:02Z","took":960441} {"action":"restapi_management","level":"info","msg":"Shutting down... ","time":"2024-07-22T01:33:46Z"} {"action":"restapi_management","level":"info","msg":"Stopped serving weaviate at http://[::]:8080","time":"2024-07-22T01:33:46Z"} {"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-07-22T01:36:47Z"} {"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-07-22T01:36:47Z"} {"action":"hnsw_vector_cache_prefill","count":50000,"index_id":"vector_index_64401738_1db4_4ee0_8aac_9f156568f022_node_QgLe6uhrio1J","level":"info","limit":1000000000000,"msg":"prefilled vector cache","t ime":"2024-07-22T01:36:49Z","took":2111166} {"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-07-22T01:36:49Z"} {"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-07-22T01:36:49Z"} PS E:\dify2\dify\docker>

dosubot[bot] commented 1 month ago

The issue with the Full-Text Search feature in Dify version 0.6.14 when self-hosted using Docker is likely due to the default vectorizer module being set to "none" in the Weaviate container. This means that new schema classes without an explicit vectorizer setting will not use any vectorizer, which is essential for full-text search functionality.

To resolve this issue, you need to configure Weaviate to use an appropriate vectorizer module. Here is how you can do it:

Edit the Weaviate Configuration: Update the Weaviate configuration to use a vectorizer module. You can do this by modifying the docker-compose.yml file or the Weaviate configuration file.
Set the Vectorizer Module: Ensure that the vectorizer module is set to a suitable value, such as text2vec-contextionary.

Here is an example of how you might update the docker-compose.yml file to include the vectorizer module:

version: '3.7'
services:
  weaviate:
    image: semitechnologies/weaviate:1.18.4
    environment:
      - QUERY_DEFAULTS_LIMIT=20
      - AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true
      - PERSISTENCE_DATA_PATH=/var/lib/weaviate
      - DEFAULT_VECTORIZER_MODULE=text2vec-contextionary
      - ENABLE_MODULES=text2vec-contextionary
    ports:
      - "8080:8080"
    volumes:
      - weaviate_data:/var/lib/weaviate
volumes:
  weaviate_data:

Restart the Docker Containers: After making the changes, restart the Docker containers to apply the new configuration.

docker compose down
docker compose up -d

By setting the DEFAULT_VECTORIZER_MODULE to text2vec-contextionary and enabling the module, Weaviate will use the specified vectorizer for new schema classes, which should resolve the issue with the full-text search returning empty results ^[1].

To continue talking to Dosu, mention @dosu.

JohnJyong commented 1 month ago

pls comfirm you are using the Qdrant as your vector db with tag：langgenius/qdrant @gz-simon

gz-simon commented 1 month ago

@dosu weaviate: image: semitechnologies/weaviate:1.25.8 ports:

"8080:8080"
"6060:6060"
"2112:2112"
"7101:7101"
"50051:50051" profiles:
''
weaviate restart: always volumes:
Mount the Weaviate data directory to the con tainer.
./volumes/weaviate:/var/lib/weaviate environment:
The Weaviate configurations

You can refer to the Weaviate documentation for more information.

PERSISTENCE_DATA_PATH: ${WEAVIATE_PERSISTENCE_DATA_PATH:-/var/lib/weaviate} QUERY_DEFAULTS_LIMIT: ${WEAVIATE_QUERY_DEFAULTS_LIMIT:-25} CONTEXTIONARY_URL: contextionary:9999 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' ENABLE_MODULES: 'text2vec-contextionary' DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary' CLUSTER_HOSTNAME: ${WEAVIATE_CLUSTER_HOSTNAME:-node1} CLUSTER_GOSSIP_BIND_PORT: '7100' CLUSTER_DATA_BIND_PORT: '7101' AUTHENTICATION_APIKEY_ENABLED: ${WEAVIATE_AUTHENTICATION_APIKEY_ENABLED:-true} AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_AUTHENTICATION_APIKEY_ALLOWED_KEYS:-WVF5YThaHlkYwhGUSmCRgsX3tD5ngdN8pkih} AUTHENTICATION_APIKEY_USERS: ${WEAVIATE_AUTHENTICATION_APIKEY_USERS:-hello@dify.ai} AUTHORIZATION_ADMINLIST_ENABLED: ${WEAVIATE_AUTHORIZATION_ADMINLIST_ENABLED:-true} AUTHORIZATION_ADMINLIST_USERS: ${WEAVIATE_AUTHORIZATION_ADMINLIST_USERS:-hello@dify.ai} contextionary: image: semitechnologies/contextionary:en0.16.0-v1.2.1 ports:
"9999:9999" environment: OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75 EXTENSIONS_STORAGE_MODE: weaviate EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
still have problem

dosubot[bot] commented 1 month ago

The issue with the Full-Text Search feature in Dify version 0.6.14 when self-hosted using Docker is likely due to the DEFAULT_VECTORIZER_MODULE being set to "none" in the Weaviate container. However, based on your provided configuration, it seems you have already set the DEFAULT_VECTORIZER_MODULE to text2vec-contextionary.

To further troubleshoot, please ensure the following:

Check Weaviate Logs: Verify that Weaviate is correctly using the text2vec-contextionary module by checking the logs for any errors or warnings related to the vectorizer module.

docker logs docker-weaviate-1

Schema Configuration: Ensure that the schema classes in Weaviate are explicitly set to use the text2vec-contextionary vectorizer. You can do this by checking the schema configuration in Weaviate.
Data Indexing: Confirm that the data has been correctly indexed using the text2vec-contextionary vectorizer. You can re-index the data if necessary.
Network Configuration: Ensure that the contextionary service is accessible from the Weaviate container. You can test this by running a network check from within the Weaviate container.

docker exec -it docker-weaviate-1 curl http://contextionary:9999

If the issue persists, please provide the logs from the Weaviate container after restarting it with the updated configuration. This will help identify any specific errors or misconfigurations.

To continue talking to Dosu, mention @dosu.

JohnJyong commented 1 month ago

weaviate isn't support full-text search , pls use qdrant instead, thanks @gz-simon

leoterry-ulrica commented 3 weeks ago

weaviate isn't support full-text search , pls use qdrant instead, thanks @gz-simon

Qdrant does not support full-text search for numerical data.

langgenius / dify

Full-Text Search problem #6508

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Mount the Weaviate data directory to the con tainer.

The Weaviate configurations

You can refer to the Weaviate documentation for more information.