xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.15k stars 419 forks source link

不同的模型需要不同的transformer版本 / Different LLM require different versions of the transformer #1984

Closed Matrixxxxxxxx closed 2 months ago

Matrixxxxxxxx commented 2 months ago

System Info / 系統信息

(xinference) ub@ub-OMEN-by-HP-Laptop-17-ck2xxx:~$ pip show torch Name: torch Version: 2.3.1 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /home/ub/miniconda3/envs/xinference/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions Required-by: accelerate, auto_gptq, autoawq, autoawq_kernels, bitsandbytes, optimum, peft, sentence-transformers, timm, torchaudio, torchvision, xinference

(xinference) ub@ub-OMEN-by-HP-Laptop-17-ck2xxx:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0

(xinference) ub@ub-OMEN-by-HP-Laptop-17-ck2xxx:~$ python --version Python 3.10.9

(xinference) ub@ub-OMEN-by-HP-Laptop-17-ck2xxx:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

(xinference) ub@ub-OMEN-by-HP-Laptop-17-ck2xxx:~$ pip show xinference Name: xinference Version: 0.13.3

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

部署llama3.1-instruction的时候,推理过程需要最新的transformer版本Version: 4.43.3

但是对于其他大模型,比如ChatGLM-2这一类的,需要的transformer版本又不一样(经过测试transformer==4.41.2是OK的对于这类模型),如果版本不匹配,那么在xinference推理的时候就会报一系列的错误

所以每次我想切换不同的大模型时,必须abort项目然后重新安装对应版本的transformer,然后再启动项目

When deploying Llama 3.1-Instructions, the inference process requires the latest transformer version: 4.43.3.

However, for other large models, such as ChatGLM-2, a different transformer version is needed (testing shows that transformer==4.41.2 works for these models). If the version does not match, a error will occur during inference with xinference.

Therefore, every time I want to switch to a different large model, I must abort the project, reinstall the corresponding version of the transformer, and then restart the project.

Expected behavior / 期待表现

How to solve this issue?

Matrixxxxxxxx commented 2 months ago

不知道上述问题各位开发者是否考虑过?还是我配置的错误导致了上述问题?我所有的安装都是你们官方的readme一步一步来的,感觉应该不是环境配置出的问题 / Has anyone considered the above issue, or is it possible that the problem is caused by an error in my configuration? I followed the official README step by step for all installations, so it seems unlikely that it is an environment configuration issue.

qinxuye commented 2 months ago

Transformers 更新太快了,而模型一旦发布下一代,就大概率不维护了,它需要的 transformers 版本也就停留在了古早的版本,这个很难解决。

qinxuye commented 2 months ago

我觉得可以通过 xinf 分布式来解决,不同 worker 起的环境有不同的 transformers 版本。加载模型可以通过指定 worker_ip 来到指定的 worker上。

qinxuye commented 2 months ago

先关闭,我认为对我们来说没有解法。