xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.33k stars 431 forks source link

BUG: Fail to run model "bge-reranker-v2-minicpm-layerwise" with Xinference v0.11.0 version (docker images) #1515

Closed majestichou closed 5 months ago

majestichou commented 5 months ago

Describe the bug

I downloaded the "bge-reranker-v2-minicpm-layerwise" model weights to the server and registered this model (the registered model name is "bge-reranker-v2-minicpm-layerwise-self") with Xinference v0.11.0 version (docker images). Then I launched this model. However, it crushed. The error information is as follows:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 697, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 836, in launch_builtin_model
    await _launch_model()
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 800, in _launch_model
    await _launch_one_model(rep_model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 781, in _launch_one_model
    await worker_ref.launch_builtin_model(
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 659, in launch_builtin_model
    await model_ref.load()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive__
    result = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 239, in load
    self._model.load()
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 134, in load
    self._model = CrossEncoder(
  File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 66, in __init__
    self.config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code, revision=revision)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
    trust_remote_code = resolve_trust_remote_code(
  File "/opt/conda/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
    raise ValueError(
ValueError: [address=0.0.0.0:40830, pid=186] Loading /root/.xinference/cache/bge-reranker-v2-minicpm-layerwise-self requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

To Reproduce

  1. Download Xinference v0.11.0 version (docker images).
  2. Downloaded the "bge-reranker-v2-gemma" model weights to the server and register this model (the registered model name is "bge-reranker-v2-minicpm-layerwise-self") with Xinference v0.11.0 version (docker images). Then launch this model.

Expected behavior

No crush. According the Xinference docs (https://inference.readthedocs.io/en/latest/models/builtin/rerank/bge-reranker-v2-minicpm-layerwise.html), bge-reranker-v2-minicpm-layerwise model is supported.

qinxuye commented 5 months ago

@codingl2k1 can you give a help?

codingl2k1 commented 5 months ago

@codingl2k1 can you give a help?

I am looking into this issue.

codingl2k1 commented 5 months ago

This model works well on my Mac, and I can run the rerank benchmark on it, though it's very slow.

Name: FlagEmbedding Version: 1.2.8

Name: transformers Version: 4.39.1

majestichou commented 5 months ago

This model works well on my Mac, and I can run the rerank benchmark on it, though it's very slow.

Name: FlagEmbedding Version: 1.2.8

Name: transformers Version: 4.39.1

Which version of Xinference docker image did you use?

majestichou commented 5 months ago

This model works well on my Mac, and I can run the rerank benchmark on it, though it's very slow.

Name: FlagEmbedding Version: 1.2.8

Name: transformers Version: 4.39.1

Can you try to repeat my steps below? I downloaded the "bge-reranker-v2-minicpm-layerwise" model weights to the server and registered this model (the registered model name is "bge-reranker-v2-minicpm-layerwise-self") with Xinference v0.11.1 version (docker images). Then I launched this model. However, it crushed. The error information is as follows: Make sure you have read the code there to avoid malicious use, then set the optiontrust_remote_code=Trueto remove this error.

codingl2k1 commented 5 months ago

This model works well on my Mac, and I can run the rerank benchmark on it, though it's very slow. Name: FlagEmbedding Version: 1.2.8 Name: transformers Version: 4.39.1

Can you try to repeat my steps below? I downloaded the "bge-reranker-v2-minicpm-layerwise" model weights to the server and registered this model (the registered model name is "bge-reranker-v2-minicpm-layerwise-self") with Xinference v0.11.1 version (docker images). Then I launched this model. However, it crushed. The error information is as follows: Make sure you have read the code there to avoid malicious use, then set the optiontrust_remote_code=Trueto remove this error.

I will try your steps.

codingl2k1 commented 5 months ago

How do you register the model bge-reranker-v2-minicpm-layerwise-self? The model spec's type should be LLM-based layerwise. From your traceback, the callstack was in type == "normal".

image

majestichou commented 5 months ago

How do you register the model bge-reranker-v2-minicpm-layerwise-self? The model spec's type should be LLM-based layerwise. From your traceback, the callstack was in type == "normal".

image

I choose Register Model Tab,select "RERANK MODEL",fill in the parameters: model name and model path in docker container and choose English and Chinese. Finally,register.