xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.7k stars 368 forks source link

BUG: `xinference list` hangs #502

Closed tianbinraindrop closed 10 months ago

tianbinraindrop commented 11 months ago

Describe the bug

A clear and concise description of what the bug is. Although the model was able to load, it did not respond for a long time. I can't use "xinference list" to list any running model.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version. I use anaconda to set xinference env by python 3.10
  2. The version of xinference you use. use "git clone https://github.com/xorbitsai/inference" to clone locally,and use pip install -e ".[all]" to setup xinference.
  3. Versions of crucial packages.Noe
  4. Full stack of the error.No
  5. Minimized code to reproduce the error. I use xinference --log-level DEBUG to run.

Expected behavior

A clear and concise description of what you expected to happen. I load baichuan-2 chat model to chat,but can't interface.

Additional context

Add any other context about the problem here. the debug log is below.

xinference) PS D:\inference> xinference --log-level DEBUG
2023-09-29 09:56:26,127 xinference   23332 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-09-29 09:56:26,130 asyncio      23332 DEBUG    Using proactor: IocpProactor
2023-09-29 09:56:26,148 xinference.core.worker 23332 DEBUG    Worker actor initialized with main pool: 127.0.0.1:61403
2023-09-29 09:56:26,148 xinference.core.supervisor 23332 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, '127.0.0.1:61403'), kwargs: {}
2023-09-29 09:56:26,149 xinference.core.supervisor 23332 INFO     Worker 127.0.0.1:61403 has been added successfully
2023-09-29 09:56:26,149 xinference.core.supervisor 23332 DEBUG    Leave add_worker, elapsed time: 0 ms
2023-09-29 09:56:26,153 xinference.deploy.worker 23332 INFO     Xinference worker successfully started.
2023-09-29 09:56:26,500 xinference.core.supervisor 23332 DEBUG    Enter list_model_registrations, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM'), kwargs: {}
2023-09-29 09:56:26,501 xinference.core.supervisor 23332 DEBUG    Leave list_model_registrations, elapsed time: 0 ms
2023-09-29 09:56:26,625 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'baichuan'), kwargs: {}
2023-09-29 09:56:26,625 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,627 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'baichuan-2'), kwargs: {}
2023-09-29 09:56:26,627 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,629 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'baichuan-chat'), kwargs: {}
2023-09-29 09:56:26,629 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,630 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'chatglm2-32k'), kwargs: {}
2023-09-29 09:56:26,631 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,633 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'baichuan-2-chat'), kwargs: {}
2023-09-29 09:56:26,633 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,635 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'chatglm2'), kwargs: {}
2023-09-29 09:56:26,635 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,636 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'code-llama'), kwargs: {}
2023-09-29 09:56:26,636 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,639 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'chatglm'), kwargs: {}
2023-09-29 09:56:26,639 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,640 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'code-llama-instruct'), kwargs: {}
2023-09-29 09:56:26,641 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,642 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'code-llama-python'), kwargs: {}
2023-09-29 09:56:26,642 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,644 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'falcon'), kwargs: {}
2023-09-29 09:56:26,644 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,645 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'falcon-instruct'), kwargs: {}
2023-09-29 09:56:26,645 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,647 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'glaive-coder'), kwargs: {}
2023-09-29 09:56:26,647 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,650 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'gpt-2'), kwargs: {}
2023-09-29 09:56:26,651 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,652 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'internlm-20b'), kwargs: {}
2023-09-29 09:56:26,652 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,653 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'internlm-7b'), kwargs: {}
2023-09-29 09:56:26,653 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,656 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'internlm-chat-20b'), kwargs: {}
2023-09-29 09:56:26,656 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,658 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'internlm-chat-7b'), kwargs: {}
2023-09-29 09:56:26,658 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,659 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'llama-2'), kwargs: {}
2023-09-29 09:56:26,659 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,661 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'llama-2-chat'), kwargs: {}
2023-09-29 09:56:26,661 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,663 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'OpenBuddy'), kwargs: {}
2023-09-29 09:56:26,663 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,665 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'opt'), kwargs: {}
2023-09-29 09:56:26,665 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,667 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'orca'), kwargs: {}
2023-09-29 09:56:26,667 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,669 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'qwen-chat'), kwargs: {}
2023-09-29 09:56:26,669 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,670 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'starchat-beta'), kwargs: {}
2023-09-29 09:56:26,671 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,672 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'starcoder'), kwargs: {}
2023-09-29 09:56:26,672 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,673 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'starcoderplus'), kwargs: {}
2023-09-29 09:56:26,674 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,675 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'tiny-llama'), kwargs: {}
2023-09-29 09:56:26,675 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,677 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'vicuna-v1.3'), kwargs: {}
2023-09-29 09:56:26,677 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,678 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'vicuna-v1.5'), kwargs: {}
2023-09-29 09:56:26,679 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,680 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'vicuna-v1.5-16k'), kwargs: {}
2023-09-29 09:56:26,681 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,683 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'wizardlm-v1.0'), kwargs: {}
2023-09-29 09:56:26,683 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:26,684 xinference.core.supervisor 23332 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>, 'LLM', 'wizardmath-v1.0'), kwargs: {}
2023-09-29 09:56:26,685 xinference.core.supervisor 23332 DEBUG    Leave get_model_registration, elapsed time: 0 ms
2023-09-29 09:56:27,130 urllib3.connectionpool 23332 DEBUG    https://api.gradio.app:443 "GET /gradio-messaging/en HTTP/1.1" 200 3
2023-09-29 09:57:01,234 xinference.core.supervisor 23332 DEBUG    Enter launch_builtin_model, model_uid: 7af2d3f0-5e6b-11ee-8db8-f1f73330b766, model_name: baichuan-2-chat, model_size: 7, model_format: pytorch, quantization: none, replica: 1
2023-09-29 09:57:01,234 xinference.core.worker 23332 DEBUG    Enter get_model_count, args: (<xinference.core.worker.WorkerActor object at 0x000001F60FC24EF0>,), kwargs: {}
2023-09-29 09:57:01,234 xinference.core.worker 23332 DEBUG    Leave get_model_count, elapsed time: 0 ms
2023-09-29 09:57:01,235 xinference.core.worker 23332 DEBUG    Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x000001F60FC24EF0>,), kwargs: {'model_uid': '7af2d3f0-5e6b-11ee-8db8-f1f73330b766-1-0', 'model_name': 'baichuan-2-chat', 'model_size_in_billions': 7, 'model_format': 'pytorch', 'quantization': 'none', 'model_type': 'LLM', 'n_gpu': 'auto'}
2023-09-29 09:57:01,235 xinference.core.supervisor 23332 DEBUG    Enter is_local_deployment, args: (<xinference.core.supervisor.SupervisorActor object at 0x000001F60F4513F0>,), kwargs: {}
2023-09-29 09:57:01,235 xinference.core.supervisor 23332 DEBUG    Leave is_local_deployment, elapsed time: 0 ms
2023-09-29 09:57:01,238 xinference.model.llm.llm_family 23332 INFO     Caching from Modelscope: baichuan-inc/Baichuan2-7B-Chat
2023-09-29 09:57:03,010 - modelscope - INFO - PyTorch version 2.0.1+cu117 Found.
2023-09-29 09:57:03,010 - modelscope - INFO - Loading ast index from C:\Users\tianb\.cache\modelscope\ast_indexer
2023-09-29 09:57:03,097 - modelscope - INFO - Loading done! Current index file version is 1.9.1, with md5 e3cab237014e3e77e0724eb4822f6854 and a total number of 924 components indexed
2023-09-29 09:57:03,449 - modelscope - INFO - Use user-specified model revision: v1.0.1

so i use webui to show running model is suspend,and use "xinference list" to list but get no response. My cpu is normal, My nvidia is 4090 ti is lightly load. what's wrong with xinference.

UranusSeven commented 11 months ago

Hi! Thanks for reporting this issue.

Listing running models hangs could happen when your model is still loading, which could take 3-5 minutes for baichuan. This is a known problem and will be fixed soon.

tianbinraindrop commented 11 months ago

can't load any models include chatglm etc...

aresnow1 commented 10 months ago

Closed by #546