xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.78k stars 377 forks source link

how to config InternVL2-Llama3-76B-AWQ #2170

Closed 401557122 closed 5 days ago

401557122 commented 2 weeks ago

System Info / 系統信息

{ "version": 1, "context_length": 32000, "model_name": "InternVL2-Llama3-76B-AWQ", "model_lang": [ "en", "zh" ], "model_ability": [ "generate", "vision", "chat" ], "model_description": "", "model_family": "other", "model_specs": [ { "model_format": "awq", "model_size_in_billions": 76, "quantizations": [ "4-bit" ], "model_id": null, "model_hub": "huggingface", "model_uri": "/root/.xinference/InternVL2-Llama3-76B-AWQ", "model_revision": null } ], "prompt_style": null, "is_builtin": false }

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

lastest

The command used to start Xinference / 用以启动 xinference 的命令

docker run -dit -v /data/ez/llms:/root/.xinference -e XINFERENCE_HOME=/root/.xinference -p 9999:9997 --gpus all --shm-size 20g --ipc=host aicenter/xinference:latest xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/pytorch/core.py", line 771, in _get_full_prompt assert self.model_family.prompt_style is not None

Expected behavior / 期待表现

how to config InternVL2-Llama3-76B-AWQ

qinxuye commented 2 weeks ago

@amumu96 Could you look at this issue.

amumu96 commented 2 weeks ago

engine Transformers does not support AWQ format, and AWQ format should not launch by engine Transformers. Could you tell me how do you launch it? If you want to launch AWQ format internvl2, please use engine LMDEPLOY

danialcheung commented 5 days ago

engine Transformers does not support AWQ format, and AWQ format should not launch by engine Transformers. Could you tell me how do you launch it? If you want to launch AWQ format internvl2, please use engine LMDEPLOY

Just tried this with no success, returns error RuntimeError: Failed to launch model, detail: [address=0.0.0.0:58248, pid=3691355] Model internvl2 cannot be run on engine LMDEPLOY.

I used this command to launch: xinference launch --model-engine LMDEPLOY --model-name internvl2 --size-in-billions 76 --model-format awq --quantization Int4

qinxuye commented 5 days ago

Did you install lmdeploy?

danialcheung commented 5 days ago

Did you install lmdeploy?

Confirmed it's running after installing lmdeploy, thanks!

qinxuye commented 5 days ago

OK, close this issue then.