xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.03k stars 401 forks source link

BUG cannot run model 'bge-reranker-v2-minicpm-layerwise' in the lastest version of xinference #1541

Open yuanzhiwei opened 4 months ago

yuanzhiwei commented 4 months ago

Describe the bug

xinference-worker-2-1 | 2024-05-24 02:10:34,107 xinference.core.worker 1 ERROR Failed to load model bge-reranker-v2-minicpm-layerwise-1-0 xinference-worker-2-1 | Traceback (most recent call last): xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 659, in launch_builtin_model xinference-worker-2-1 | await model_ref.load() xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send xinference-worker-2-1 | return self._process_result_message(result) xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message xinference-worker-2-1 | raise message.as_instanceof_cause() xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send xinference-worker-2-1 | result = await self._run_coro(message.message_id, coro) xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro xinference-worker-2-1 | return await coro xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive xinference-worker-2-1 | return await super().on_receive(message) # type: ignore xinference-worker-2-1 | File "xoscar/core.pyx", line 558, in on_receive__ xinference-worker-2-1 | raise ex xinference-worker-2-1 | File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive xinference-worker-2-1 | async with self._lock: xinference-worker-2-1 | File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive xinference-worker-2-1 | with debug_async_timeout('actor_lock_timeout', xinference-worker-2-1 | File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive xinference-worker-2-1 | result = func(*args, kwargs) xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 239, in load xinference-worker-2-1 | self._model.load() xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 157, in load xinference-worker-2-1 | self._model = FlagReranker(self._model_path, use_fp16=self._use_fp16) xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/FlagEmbedding/flag_reranker.py", line 400, in init xinference-worker-2-1 | self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 819, in from_pretrained xinference-worker-2-1 | config = AutoConfig.from_pretrained( xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 928, in from_pretrained xinference-worker-2-1 | config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 631, in get_config_dict xinference-worker-2-1 | config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 686, in _get_config_dict xinference-worker-2-1 | resolved_config_file = cached_file( xinference-worker-2-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 369, in cached_file xinference-worker-2-1 | raise EnvironmentError( xinference-worker-2-1 | OSError: [address=xinference-worker-2:33175, pid=78] /root/.xinference/cache/bge-reranker-v2-minicpm-layerwise does not appear to have a file named config.json. Checkout 'https://huggingface.co//root/.xinference/cache/bge-reranker-v2-minicpm-layerwise/tree/None' for available files. xinference-supervisor-1 | 2024-05-24 02:10:34,160 xinference.api.restful_api 1 ERROR [address=xinference-worker-2:33175, pid=78] /root/.xinference/cache/bge-reranker-v2-minicpm-layerwise does not appear to have a file named config.json. Checkout 'https://huggingface.co//root/.xinference/cache/bge-reranker-v2-minicpm-layerwise/tree/None' for available files. xinference-supervisor-1 | Traceback (most recent call last): xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 697, in launch_model xinference-supervisor-1 | model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send xinference-supervisor-1 | return self._process_result_message(result) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message xinference-supervisor-1 | raise message.as_instanceof_cause() xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send xinference-supervisor-1 | result = await self._run_coro(message.message_id, coro) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro xinference-supervisor-1 | return await coro xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive xinference-supervisor-1 | return await super().on_receive(message) # type: ignore xinference-supervisor-1 | File "xoscar/core.pyx", line 558, in on_receive__ xinference-supervisor-1 | raise ex xinference-supervisor-1 | File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive xinference-supervisor-1 | async with self._lock: xinference-supervisor-1 | File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive xinference-supervisor-1 | with debug_async_timeout('actor_lock_timeout', xinference-supervisor-1 | File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive xinference-supervisor-1 | result = await result xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 836, in launch_builtin_model xinference-supervisor-1 | await _launch_model() xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 800, in _launch_model xinference-supervisor-1 | await _launch_one_model(rep_model_uid) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 781, in _launch_one_model xinference-supervisor-1 | await worker_ref.launch_builtin_model( xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send xinference-supervisor-1 | return self._process_result_message(result) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message xinference-supervisor-1 | raise message.as_instanceof_cause() xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send xinference-supervisor-1 | result = await self._run_coro(message.message_id, coro) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro xinference-supervisor-1 | return await coro xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive xinference-supervisor-1 | return await super().on_receive(message) # type: ignore xinference-supervisor-1 | File "xoscar/core.pyx", line 558, in on_receive__ xinference-supervisor-1 | raise ex xinference-supervisor-1 | File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive xinference-supervisor-1 | async with self._lock: xinference-supervisor-1 | File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive xinference-supervisor-1 | with debug_async_timeout('actor_lock_timeout', xinference-supervisor-1 | File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive xinference-supervisor-1 | result = await result xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped xinference-supervisor-1 | ret = await func(*args, *kwargs) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 659, in launch_builtin_model xinference-supervisor-1 | await model_ref.load() xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send xinference-supervisor-1 | return self._process_result_message(result) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message xinference-supervisor-1 | raise message.as_instanceof_cause() xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send xinference-supervisor-1 | result = await self._run_coro(message.message_id, coro) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro xinference-supervisor-1 | return await coro xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive xinference-supervisor-1 | return await super().on_receive(message) # type: ignore xinference-supervisor-1 | File "xoscar/core.pyx", line 558, in on_receive__ xinference-supervisor-1 | raise ex xinference-supervisor-1 | File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive xinference-supervisor-1 | async with self._lock: xinference-supervisor-1 | File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive xinference-supervisor-1 | with debug_async_timeout('actor_lock_timeout', xinference-supervisor-1 | File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive xinference-supervisor-1 | result = func(args, kwargs) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 239, in load xinference-supervisor-1 | self._model.load() xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 157, in load xinference-supervisor-1 | self._model = FlagReranker(self._model_path, use_fp16=self._use_fp16) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/FlagEmbedding/flag_reranker.py", line 400, in init xinference-supervisor-1 | self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 819, in from_pretrained xinference-supervisor-1 | config = AutoConfig.from_pretrained( xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 928, in from_pretrained xinference-supervisor-1 | config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 631, in get_config_dict xinference-supervisor-1 | config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 686, in _get_config_dict xinference-supervisor-1 | resolved_config_file = cached_file( xinference-supervisor-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 369, in cached_file xinference-supervisor-1 | raise EnvironmentError( xinference-supervisor-1 | OSError: [address=xinference-worker-2:33175, pid=78] /root/.xinference/cache/bge-reranker-v2-minicpm-layerwise does not appear to have a file named config.json. Checkout 'https://huggingface.co//root/.xinference/cache/bge-reranker-v2-minicpm-layerwise/tree/None' for available files.

To Reproduce

xinference -> launch model -> rerank models -> bge-reranker-v2-minicpm-layerwise -> default params launch -> error

  1. deploy on docker compose
  2. gpu
  3. model src : modelscope
github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.