bge-reranker-v2-minicpm-layerwise from Xinference mount error

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.6

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I want to mount bge-reranker-v2-minicpm-layerwise from Xinference to Dify as follows: rerank_error

✔️ Expected Behavior

No response

❌ Actual Behavior

But it failed, the docker log shows:

Exception in thread Thread-1 (embedding_search):
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/rerank_model.py", line 35, in invoke
    return self._invoke(model, credentials, query, docs, score_threshold, top_n, user)
  File "/app/api/core/model_runtime/model_providers/xinference/rerank/rerank.py", line 51, in _invoke
    response = handle.rerank(
  File "/usr/local/lib/python3.10/site-packages/xinference_client/client/restful/restful_client.py", line 161, in rerank
    raise RuntimeError(
RuntimeError: Failed to rerank documents, detail: [address=0.0.0.0:56051, pid=1240943] Model not found in the model list, uid: bge-reranker-v2-minicpm-layerwise

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/app/api/core/rag/datasource/retrieval_service.py", line 134, in embedding_search
    all_documents.extend(data_post_processor.invoke(
  File "/app/api/core/rag/data_post_processor/data_post_processor.py", line 22, in invoke
    documents = self.rerank_runner.run(query, documents, score_threshold, top_n, user)
  File "/app/api/core/rerank/rerank.py", line 33, in run
    rerank_result = self.rerank_model_instance.invoke_rerank(
  File "/app/api/core/model_manager.py", line 120, in invoke_rerank
    return self.model_type_instance.invoke(
  File "/app/api/core/model_runtime/model_providers/__base/rerank_model.py", line 37, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [xinference] Error: Failed to rerank documents, detail: [address=0.0.0.0:56051, pid=1240943] Model not found in the model list, uid: bge-reranker-v2-minicpm-layerwise
Building prefix dict from the default dictionary ...
2024-05-14 06:55:52.610 DEBUG [Thread-3 (keyword_search)] [__init__.py:113] - Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
2024-05-14 06:55:53.307 DEBUG [Thread-3 (keyword_search)] [__init__.py:146] - Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.784 seconds.
2024-05-14 06:55:53.394 DEBUG [Thread-3 (keyword_search)] [__init__.py:164] - Loading model cost 0.784 seconds.
Prefix dict has been built successfully.
2024-05-14 06:55:53.394 DEBUG [Thread-3 (keyword_search)] [__init__.py:166] - Prefix dict has been built successfully.
/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/model.py:120: SAWarning: This declarative base already contains a class with the same class name and module name as services.app_service.ModifiedApp, and will be replaced in the string-lookup table.

The error you're encountering, where the bge-reranker-v2-minicpm-layerwise model is not found, suggests a configuration or compatibility issue. Here's a focused approach to address this:

Check Model Compatibility: Ensure the bge-reranker-v2-minicpm-layerwise model is compatible with the ggml format, as required by LocalAI. Compatibility details can be found in the LocalAI documentation.
Model Configuration: If compatible, download and place the model in the correct directory within your LocalAI setup. Then, update the LocalAI configuration to include this model, specifying its external access name and any required settings in a YAML file.
Restart and Update Settings: After configuring the model in LocalAI, restart the instance. In Dify, navigate to Settings > Model Providers > LocalAI and add the bge-reranker-v2-minicpm-layerwise model with its details.
Test the Model: Test the integration by making a request to the LocalAI endpoint, ensuring the model is recognized and usable for inference.

If the model is not ggml compatible or requires specific configurations not covered here, further documentation on LocalAI or the model may be necessary.

#### Details

**Sources** - [en/tutorials/model-configuration/localai.md](https://github.com/langgenius/dify-docs/blob/main/en/tutorials/model-configuration/localai.md) - [api/core/model_runtime/model_providers/localai/localai.yaml](https://github.com/langgenius/dify/blob/main/api/core/model_runtime/model_providers/localai/localai.yaml) - [api/tests/integration_tests/model_runtime/localai/test_rerank.py](https://github.com/langgenius/dify/blob/main/api/tests/integration_tests/model_runtime/localai/test_rerank.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Check Model Compatibility: Ensure the bge-reranker-v2-minicpm-layerwise model is compatible with the ggml format, as required by LocalAI. Compatibility details can be found in the LocalAI documentation.

How to check this compatibility ? I download the model from huggingface. https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise

To check the compatibility of the bge-reranker-v2-minicpm-layerwise model from Hugging Face with the ggml format required by LocalAI, follow these steps:

Model Format and Conversion: Initially, verify if the model is available in a ggml compatible format on Hugging Face. If not, you may need to look for conversion tools or scripts provided by the model's authors or the community. Sometimes, model repositories include utility scripts for converting models to different formats.
Integration with LocalAI: After securing a compatible version or converting the model, proceed to integrate it with LocalAI. This involves placing the model in the designated directory as per LocalAI's documentation and configuring it correctly within LocalAI's settings. Ensure you follow the deployment instructions closely, including setting up any necessary environment variables and adjusting the configuration files as needed.
Testing for Compatibility: With the model deployed in LocalAI, test its functionality through LocalAI's REST API. A successful test indicates that the model is compatible and correctly set up. This step is crucial to ensure that the model not only exists within LocalAI but is also operational and can be accessed as expected.

If you encounter issues during these steps, such as errors in model conversion or integration, it's advisable to consult the documentation of both the model and LocalAI. Additionally, community forums or the support channels of Hugging Face and LocalAI can be valuable resources for troubleshooting specific problems or getting advice on model conversion and deployment strategies.

Remember, the key to successful integration lies in ensuring the model is in the correct format, properly configured within LocalAI, and tested to confirm its operational status.

#### Details

**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Seconding this but also @lk1983823 LocalAI has their own Web UX downloader, so could you note down possible ways to get around this? Maybe get FlashRank in on the action? Also relevant ideas https://github.com/langgenius/dify/issues/4377 https://github.com/langgenius/dify/discussions/3779

langgenius / dify