xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.38k stars 436 forks source link

BUG:The embedding model has been used for a while and is no longer available. #1345

Closed JasonFlyBeauty closed 6 months ago

JasonFlyBeauty commented 6 months ago

Describe the bug

The embedding model has been used for a while and is no longer available.

The actual situation is: I use the embedding model for some time, I can't find the embedding model in the webui interface, and the API can't be used, but I can still find the uid in the terminal, and if I use the ui web to create the model again, the uid is no longer bge-m3-1-0, so I need to delete the model in the terminal, and then create the model again in the ui. I need to delete the previous model in terminal and create the model in ui again.

Translated with DeepL.com (free version)

To Reproduce

docker:xprobe/xinference:latest

2024-04-22 03:16:25,220 xinference.core.worker 95 DEBUG    Enter terminate_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>, 'bge-m3-1-0'), kwargs: {}
2024-04-22 03:16:25,221 xinference.core.worker 95 DEBUG    Destroy model actor failed, model uid: bge-m3-1-0, error: [Errno 111] Connection refused
2024-04-22 03:16:25,221 xinference.core.worker 95 DEBUG    Remove sub pool failed, model uid: bge-m3-1-0, error: '0.0.0.0:33245'
2024-04-22 03:16:25,221 xinference.core.worker 95 DEBUG    Leave terminate_model, elapsed time: 0 s
2024-04-22 03:16:25,222 xinference.core.worker 95 WARNING  Recreating model actor bge-m3-1-0 ...
2024-04-22 03:16:25,222 xinference.core.worker 95 DEBUG    Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0', 'model_name': 'bge-m3', 'model_size_in_billions': None, 'model_format': None, 'quantization': None, 'model_type': 'embedding', 'n_gpu': 'auto', 'peft_model_path': None, 'image_lora_load_kwargs': None, 'image_lora_fuse_kwargs': None, 'request_limits': None, 'gpu_idx': None, 'event_model_uid': 'bge-m3', '_': 1, '__': 0}
2024-04-22 03:16:25,223 xinference.core.worker 95 DEBUG    GPU selected: [0] for model bge-m3-1-0
2024-04-22 03:16:28,479 xinference.model.embedding.core 95 DEBUG    Embedding model bge-m3 found in ModelScope.
2024-04-22 03:16:28,480 xinference.model.utils 95 INFO     Use model cache from a different hub.
2024-04-22 03:16:28,481 xinference.core.worker 95 ERROR    Failed to load model bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 662, in launch_builtin_model
    model, model_description = await asyncio.to_thread(
  File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/core.py", line 83, in create_model_instance
    return create_embedding_model_instance(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/embedding/core.py", line 329, in create_embedding_model_instance
    model = EmbeddingModel(model_uid, model_path, **kwargs)
TypeError: EmbeddingModel.__init__() got an unexpected keyword argument 'event_model_uid'
2024-04-22 03:20:29,113 xinference.core.worker 95 ERROR    Report status got error.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 781, in report_status
    status = await asyncio.to_thread(gather_node_info)
  File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
asyncio.exceptions.CancelledError
ChengjieLi28 commented 6 months ago

@JasonFlyBeauty Could you please show me the complete debug logs from the start of the xinference? Also, have you ever updated your docker image? You can query the xinference version inside your docker container by:

pip show xinference
JasonFlyBeauty commented 6 months ago
/workspace# pip show xinference
Name: xinference
Version: 0.10.1
Summary: Model Serving Made Easy
Home-page: https://github.com/xorbitsai/inference
Author: Qin Xuye
Author-email: qinxuye@xprobe.io
License: Apache License 2.0
Location: /opt/conda/lib/python3.10/site-packages
Requires: aioprometheus, async-timeout, click, fastapi, fsspec, gradio, huggingface-hub, modelscope, openai, opencv-contrib-python, passlib, peft, pillow, pydantic, pynvml, python-jose, requests, s3fs, sse-starlette, tabulate, timm, torch, tqdm, typer, typing-extensions, uvicorn, xoscar
Required-by: 
024-04-20 03:36:51,189 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-20 03:42:58,968 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 05:01:13,559 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 18:55:57,152 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 22:58:54,750 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 22:58:55,913 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 22:58:58,615 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 22:59:05,399 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-21 23:51:15,401 uvicorn.error 1 WARNING  Invalid HTTP request received.
2024-04-22 01:36:16,911 xinference.core.supervisor 95 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7d94b63bf9c0>, 'bge-m3'), kwargs: {}
2024-04-22 01:36:16,911 xinference.core.worker 95 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0'}
2024-04-22 01:36:16,912 xinference.core.worker 95 DEBUG    Leave get_model, elapsed time: 0 s
2024-04-22 01:36:16,912 xinference.core.supervisor 95 DEBUG    Leave get_model, elapsed time: 0 s
2024-04-22 01:36:16,916 xinference.core.model 113 DEBUG    Enter wrapped_func, args: (<xinference.core.model.ModelActor object at 0x79cf160f31f0>, ['第四十二条  人民检察院认为确有必要的,可以勘验物证或者现场。后续内容是什么']), kwargs: {}
2024-04-22 01:36:16,917 xinference.core.model 113 DEBUG    Request create_embedding, current serve request count: 0, request limit: None for the model bge-m3-1-0
2024-04-22 01:36:16,922 sentence_transformers.SentenceTransformer 113 WARNING  `SentenceTransformer._target_device` has been removed, please use `SentenceTransformer.device` instead.
2024-04-22 01:36:16,981 xinference.core.model 113 DEBUG    After request create_embedding, current serve request count: 0 for the model bge-m3-1-0
2024-04-22 01:36:16,981 xinference.core.model 113 DEBUG    Leave wrapped_func, elapsed time: 0 s
2024-04-22 03:16:25,213 xinference.core.worker 95 WARNING  Process 0.0.0.0:33245 is down.
2024-04-22 03:16:25,220 xinference.core.worker 95 DEBUG    Enter terminate_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>, 'bge-m3-1-0'), kwargs: {}
2024-04-22 03:16:25,221 xinference.core.worker 95 DEBUG    Destroy model actor failed, model uid: bge-m3-1-0, error: [Errno 111] Connection refused
2024-04-22 03:16:25,221 xinference.core.worker 95 DEBUG    Remove sub pool failed, model uid: bge-m3-1-0, error: '0.0.0.0:33245'
2024-04-22 03:16:25,221 xinference.core.worker 95 DEBUG    Leave terminate_model, elapsed time: 0 s
2024-04-22 03:16:25,222 xinference.core.worker 95 WARNING  Recreating model actor bge-m3-1-0 ...
2024-04-22 03:16:25,222 xinference.core.worker 95 DEBUG    Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0', 'model_name': 'bge-m3', 'model_size_in_billions': None, 'model_format': None, 'quantization': None, 'model_type': 'embedding', 'n_gpu': 'auto', 'peft_model_path': None, 'image_lora_load_kwargs': None, 'image_lora_fuse_kwargs': None, 'request_limits': None, 'gpu_idx': None, 'event_model_uid': 'bge-m3', '_': 1, '__': 0}
2024-04-22 03:16:25,223 xinference.core.worker 95 DEBUG    GPU selected: [0] for model bge-m3-1-0
2024-04-22 03:16:28,479 xinference.model.embedding.core 95 DEBUG    Embedding model bge-m3 found in ModelScope.
2024-04-22 03:16:28,480 xinference.model.utils 95 INFO     Use model cache from a different hub.
2024-04-22 03:16:28,481 xinference.core.worker 95 ERROR    Failed to load model bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 662, in launch_builtin_model
    model, model_description = await asyncio.to_thread(
  File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/core.py", line 83, in create_model_instance
    return create_embedding_model_instance(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/embedding/core.py", line 329, in create_embedding_model_instance
    model = EmbeddingModel(model_uid, model_path, **kwargs)
TypeError: EmbeddingModel.__init__() got an unexpected keyword argument 'event_model_uid'
2024-04-22 03:16:28,522 xoscar.backends.pool 95 ERROR    Monitor sub pool 0.0.0.0:33245 failed
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 1438, in monitor_sub_pools
    await self.recover_sub_pool(address)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 155, in recover_sub_pool
    await self.launch_builtin_model(**launch_args)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 662, in launch_builtin_model
    model, model_description = await asyncio.to_thread(
  File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/core.py", line 83, in create_model_instance
    return create_embedding_model_instance(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/embedding/core.py", line 329, in create_embedding_model_instance
    model = EmbeddingModel(model_uid, model_path, **kwargs)
TypeError: EmbeddingModel.__init__() got an unexpected keyword argument 'event_model_uid'
2024-04-22 03:20:29,113 xinference.core.worker 95 ERROR    Report status got error.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 781, in report_status
    status = await asyncio.to_thread(gather_node_info)
  File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 780, in report_status
    async with timeout(2):
  File "/opt/conda/lib/python3.10/site-packages/async_timeout/__init__.py", line 141, in __aexit__
    self._do_exit(exc_type)
  File "/opt/conda/lib/python3.10/site-packages/async_timeout/__init__.py", line 228, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2024-04-22 05:16:35,159 xinference.core.supervisor 95 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7d94b63bf9c0>, 'bge-m3'), kwargs: {}
2024-04-22 05:16:35,160 xinference.core.worker 95 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0'}
2024-04-22 05:16:35,166 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1022, in create_embedding
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 935, in get_model
    return await worker_ref.get_model(model_uid=replica_model_uid)
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 285, in xoscar.core.__pyx_actor_method_wrapper
    result = method(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 65, in wrapped
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 766, in get_model
    raise ValueError(f"Model not found, uid: {model_uid}")
ValueError: [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
2024-04-22 05:18:03,781 xinference.core.supervisor 95 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7d94b63bf9c0>, 'bge-m3'), kwargs: {}
2024-04-22 05:18:03,782 xinference.core.worker 95 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0'}
2024-04-22 05:18:03,788 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1022, in create_embedding
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 935, in get_model
    return await worker_ref.get_model(model_uid=replica_model_uid)
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 285, in xoscar.core.__pyx_actor_method_wrapper
    result = method(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 65, in wrapped
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 766, in get_model
    raise ValueError(f"Model not found, uid: {model_uid}")
ValueError: [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
2024-04-22 05:18:33,297 xinference.core.supervisor 95 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7d94b63bf9c0>, 'bge-m3'), kwargs: {}
2024-04-22 05:18:33,297 xinference.core.worker 95 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0'}
2024-04-22 05:18:33,301 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1022, in create_embedding
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 935, in get_model
    return await worker_ref.get_model(model_uid=replica_model_uid)
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 285, in xoscar.core.__pyx_actor_method_wrapper
    result = method(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 65, in wrapped
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 766, in get_model
    raise ValueError(f"Model not found, uid: {model_uid}")
ValueError: [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
2024-04-22 05:19:38,011 xinference.core.supervisor 95 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7d94b63bf9c0>, 'bge-m3'), kwargs: {}
2024-04-22 05:19:38,012 xinference.core.worker 95 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0'}
2024-04-22 05:19:38,018 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1022, in create_embedding
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 935, in get_model
    return await worker_ref.get_model(model_uid=replica_model_uid)
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 285, in xoscar.core.__pyx_actor_method_wrapper
    result = method(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 65, in wrapped
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 766, in get_model
    raise ValueError(f"Model not found, uid: {model_uid}")
ValueError: [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
2024-04-22 05:20:12,837 uvicorn.error 1 ERROR    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/lib/python3.10/site-packages/aioprometheus/asgi/middleware.py", line 184, in __call__
    await self.asgi_callable(scope, receive, wrapped_send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1011, in create_embedding
    body = CreateEmbeddingRequest.parse_obj(payload)
  File "/opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py", line 526, in parse_obj
    return cls(**obj)
  File "/opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CreateEmbeddingRequest
model
  field required (type=value_error.missing)
2024-04-22 05:20:25,108 xinference.core.supervisor 95 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7d94b63bf9c0>, 'bge-m3'), kwargs: {}
2024-04-22 05:20:25,108 xinference.core.worker 95 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7d960864b380>,), kwargs: {'model_uid': 'bge-m3-1-0'}
2024-04-22 05:20:25,114 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1022, in create_embedding
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 935, in get_model
    return await worker_ref.get_model(model_uid=replica_model_uid)
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 285, in xoscar.core.__pyx_actor_method_wrapper
    result = method(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 65, in wrapped
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 766, in get_model
    raise ValueError(f"Model not found, uid: {model_uid}")
ValueError: [address=0.0.0.0:15978, pid=95] Model not found, uid: bge-m3-1-0

@JasonFlyBeauty Could you please show me the complete debug logs from the start of the xinference? Also, have you ever updated your docker image? You can query the xinference version inside your docker container by:

pip show xinference
ChengjieLi28 commented 6 months ago

@JasonFlyBeauty Thanks for reporting. This is a bug. We will fix it ASAP. By the way, the reason you are facing this problem is that, you have triggered a CUDA OOM while using embedding. This error is an error in the auto-recovery process. Please check the status of your GPU memory while using the embedding model.

JasonFlyBeauty commented 6 months ago

Thank you very much for your help.

@JasonFlyBeauty Thanks for reporting. This is a bug. We will fix it ASAP. By the way, the reason you are facing this problem is that, you have triggered a CUDA OOM while using embedding. This error is an error in the auto-recovery process. Please check the status of your GPU memory while using the embedding model.

JasonFlyBeauty commented 6 months ago

Also, I would like to know how to specify the GPU card for the embedding model and if it supports multi-card running

@JasonFlyBeauty Thanks for reporting. This is a bug. We will fix it ASAP. By the way, the reason you are facing this problem is that, you have triggered a CUDA OOM while using embedding. This error is an error in the auto-recovery process. Please check the status of your GPU memory while using the embedding model.

ChengjieLi28 commented 6 months ago

Also, I would like to know how to specify the GPU card for the embedding model and if it supports multi-card running

@JasonFlyBeauty Thanks for reporting. This is a bug. We will fix it ASAP. By the way, the reason you are facing this problem is that, you have triggered a CUDA OOM while using embedding. This error is an error in the auto-recovery process. Please check the status of your GPU memory while using the embedding model.

Use --gpu-idx option when launching models. See https://inference.readthedocs.io/en/latest/reference/generated/xinference.client.Client.launch_model.html

xinference launch <other_options> --gpu-idx <your_specific_gpu_index>
JasonFlyBeauty commented 6 months ago

Thanks again for your help!

另外,我想知道如何指定嵌入模型的GPU卡以及是否支持多卡运行

@JasonFlyBeauty感谢您的报告。这是一个错误。我们会加速修复它。顺便说一句,您遇到此问题的原因是,您在使用嵌入时触发了 CUDA OOM。该错误是自动恢复过程中的错误。使用嵌入模型时请检查GPU内存的状态。

启动模型时使用 --gpu-idx 选项。请参考https://inference.readthedocs.io/en/latest/reference/ generated /xinference.client.Client.launch_model.html

xinference launch <other_options> --gpu-idx <your_specific_gpu_index>