Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装
Version info / 版本信息
0.13.3
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local -H 0.0.0.0
Reproduction / 复现过程
Server error: 400 - [address=0.0.0.0:43423, pid=200222] Couldn't instantiate the backend tokenizer from one of: (1) a 'tokenizers' library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
System Info / 系統信息
Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Python 3.11.8
Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0 transformers 4.43.3 Package Version
absl-py 2.1.0 accelerate 0.31.0 addict 2.4.0 aiobotocore 2.7.0 aiofiles 23.2.1 aiohttp 3.9.5 aioitertools 0.11.0 aioprometheus 23.12.0 aiosignal 1.3.1 alembic 1.13.2 aliyun-python-sdk-core 2.15.1 aliyun-python-sdk-kms 2.16.3 altair 5.3.0 annotated-types 0.7.0 anthropic 0.28.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 async-lru 2.0.4 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.2.0 audioread 3.0.1 auto_gptq 0.7.1 autoawq 0.2.5 autoawq_kernels 0.0.6 autopage 0.5.2 Babel 2.15.0 bcrypt 4.1.3 beautifulsoup4 4.12.3 bibtexparser 2.0.0b7 bitsandbytes 0.43.1 bleach 6.1.0 boto3 1.28.64 botocore 1.31.64 cdifflib 1.2.6 certifi 2024.6.2 cffi 1.16.0 cfgv 3.4.0 charset-normalizer 3.3.2 chatglm-cpp 0.3.2 chattts 0.1.1 click 8.1.7 cliff 4.7.0 clldutils 3.22.2 cloudpickle 3.0.0 cmaes 0.10.0 cmake 3.29.5 cmd2 2.4.3 colorama 0.4.6 coloredlogs 15.0.1 colorlog 6.8.2 comm 0.2.2 conformer 0.3.2 contourpy 1.2.1 controlnet-aux 0.0.7 crcmod 1.7 cryptography 42.0.8 csvw 3.3.0 cycler 0.12.1 Cython 3.0.10 datasets 2.18.0 debugpy 1.8.2 decorator 5.1.1 defusedxml 0.7.1 diffusers 0.25.0 dill 0.3.8 diskcache 5.6.3 distlib 0.3.8 distro 1.9.0 dlinfo 1.2.1 dnspython 2.6.1 ecdsa 0.19.0 editdistance 0.8.1 einops 0.8.0 einx 0.2.2 email_validator 2.1.1 encodec 0.1.1 executing 2.0.1 fastapi 0.110.3 fastapi-cli 0.0.4 fastjsonschema 2.20.0 ffmpy 0.3.2 filelock 3.14.0 FlagEmbedding 1.2.10 flatbuffers 24.3.25 fonttools 4.53.0 fqdn 1.5.1 frozendict 2.4.4 frozenlist 1.4.1 fsspec 2023.10.0 gast 0.5.4 gdown 5.2.0 gekko 1.1.1 gradio 4.26.0 gradio_client 0.15.1 greenlet 3.0.3 grpcio 1.65.1 h11 0.14.0 hf_transfer 0.1.6 hiredis 2.3.2 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.3 humanfriendly 10.0 hydra-colorlog 1.2.0 hydra-core 1.3.2 hydra-optuna-sweeper 1.2.0 HyperPyYAML 1.2.2 identify 2.6.0 idna 3.7 imageio 2.34.1 importlib_metadata 7.1.0 importlib_resources 6.4.0 inflect 7.2.1 iniconfig 2.0.0 interegular 0.3.3 ipykernel 6.29.5 ipython 8.26.0 ipywidgets 8.1.3 isodate 0.6.1 isoduration 20.11.0 jedi 0.19.1 Jinja2 3.1.4 jiter 0.4.1 jmespath 0.10.0 joblib 1.4.2 json5 0.9.25 jsonpointer 3.0.0 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.2 jupyter_server_terminals 0.5.3 jupyterlab 4.2.4 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 jupyterlab_widgets 3.0.11 kiwisolver 1.4.5 language-tags 1.2.0 lark 1.1.9 lazy_loader 0.4 libnacl 2.1.0 librosa 0.10.2.post1 lightning 2.3.3 lightning-utilities 0.11.6 litellm 1.40.15 llama_cpp_python 0.2.77 llvmlite 0.42.0 lm-format-enforcer 0.10.1 lxml 5.2.2 Mako 1.3.5 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matcha-tts 0.0.5.1 matplotlib 3.9.0 matplotlib-inline 0.1.7 mdurl 0.1.2 mistune 3.0.2 modelscope 1.15.0 more-itertools 10.2.0 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nemo_text_processing 1.0.2 nest-asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 nodeenv 1.9.1 notebook 7.2.1 notebook_shim 0.2.4 numba 0.59.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.40 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 onnxruntime 1.16.0 openai 1.33.0 openai-whisper 20231117 opencv-contrib-python 4.10.0.82 opencv-python 4.10.0.82 opencv-python-headless 4.10.0.82 optimum 1.21.2 optuna 2.10.1 orjson 3.10.3 oss2 2.18.5 outlines 0.0.34 overrides 7.7.0 packaging 24.1 pandas 2.2.2 pandocfilters 1.5.1 parso 0.8.4 passlib 1.7.4 pbr 6.0.0 peft 0.11.1 pexpect 4.9.0 phonemizer 3.2.1 pillow 10.3.0 pip 24.2 pip-review 1.3.0 piper-phonemize 1.1.0 platformdirs 4.2.2 pluggy 1.5.0 plumbum 1.8.3 pooch 1.8.2 pre-commit 3.7.1 prettytable 3.10.2 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 5.9.8 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 16.1.0 pyarrow-hotfix 0.6 pyasn1 0.6.0 pybase16384 0.3.7 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.7.3 pydantic_core 2.18.4 pydub 0.25.1 Pygments 2.18.0 pylatexenc 2.10 pynini 2.1.5 pynvml 11.5.0 pyparsing 3.1.2 pyperclip 1.9.0 PySocks 1.7.1 pytest 8.3.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-jose 3.3.0 python-json-logger 2.0.7 python-multipart 0.0.9 pytorch-lightning 2.3.3 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 quantile-python 1.1 ray 2.24.0 rdflib 7.0.0 redis 5.0.7 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986 1.5.0 rfc3986-validator 0.1.1 rich 13.7.1 rootutils 1.0.7 rouge 1.0.1 rpds-py 0.18.1 rpyc 6.0.0 rsa 4.9 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 ruff 0.4.8 s3fs 2023.10.0 s3transfer 0.7.0 sacremoses 0.1.1 safetensors 0.4.3 scikit-image 0.23.2 scikit-learn 1.5.0 scipy 1.13.1 seaborn 0.13.2 segments 2.2.1 semantic-version 2.10.0 Send2Trash 1.8.3 sentence-transformers 3.0.1 sentencepiece 0.2.0 setuptools 70.0.0 sglang 0.1.17 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 sniffio 1.3.1 socksio 1.0.0 sortedcontainers 2.4.0 soundfile 0.12.1 soupsieve 2.5 soxr 0.3.7 SQLAlchemy 2.0.31 sse-starlette 2.1.0 stack-data 0.6.3 starlette 0.37.2 stevedore 5.2.0 sympy 1.12.1 tabulate 0.9.0 tblib 3.0.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 tensorizer 2.9.0 terminado 0.18.1 threadpoolctl 3.5.0 tifffile 2024.5.22 tiktoken 0.7.0 timm 1.0.3 tinycss2 1.3.0 tokenizers 0.19.1 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.3.0 torchaudio 2.3.0 torchmetrics 1.4.0.post0 torchvision 0.18.0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.43.3 transformers-stream-generator 0.0.5 triton 2.3.0 typeguard 4.3.0 typer 0.11.1 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 ujson 5.10.0 Unidecode 1.3.8 uri-template 1.3.0 uritemplate 4.1.1 urllib3 2.0.7 uvicorn 0.30.1 uvloop 0.19.0 vector-quantize-pytorch 1.14.24 virtualenv 20.26.3 vllm 0.4.3 vllm-flash-attn 2.5.8.post2 vocos 0.1.0 watchfiles 0.22.0 wcwidth 0.2.13 webcolors 24.6.0 webencodings 0.5.1 websocket-client 1.8.0 websockets 11.0.3 Werkzeug 3.0.3 WeTextProcessing 1.0.1 wget 3.2 wheel 0.43.0 widgetsnbextension 4.0.11 wrapt 1.16.0 xformers 0.0.26.post1 xinference 0.13.3 xoscar 0.3.0 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.19.2 zmq 0.0.0 zstandard 0.22.0
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.13.3
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local -H 0.0.0.0
Reproduction / 复现过程
Server error: 400 - [address=0.0.0.0:43423, pid=200222] Couldn't instantiate the backend tokenizer from one of: (1) a 'tokenizers' library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
Expected behavior / 期待表现
llama-3.1-instruct模型正常启动