how to config InternVL2-Llama3-76B-AWQ

401557122 commented 2 weeks ago

System Info / 系統信息

{ "version": 1, "context_length": 32000, "model_name": "InternVL2-Llama3-76B-AWQ", "model_lang": [ "en", "zh" ], "model_ability": [ "generate", "vision", "chat" ], "model_description": "", "model_family": "other", "model_specs": [ { "model_format": "awq", "model_size_in_billions": 76, "quantizations": [ "4-bit" ], "model_id": null, "model_hub": "huggingface", "model_uri": "/root/.xinference/InternVL2-Llama3-76B-AWQ", "model_revision": null } ], "prompt_style": null, "is_builtin": false }

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[X] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

lastest

The command used to start Xinference / 用以启动 xinference 的命令

docker run -dit -v /data/ez/llms:/root/.xinference -e XINFERENCE_HOME=/root/.xinference -p 9999:9997 --gpus all --shm-size 20g --ipc=host aicenter/xinference:latest xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/pytorch/core.py", line 771, in _get_full_prompt assert self.model_family.prompt_style is not None

Expected behavior / 期待表现

how to config InternVL2-Llama3-76B-AWQ

qinxuye commented 2 weeks ago

@amumu96 Could you look at this issue.

amumu96 commented 2 weeks ago

engine Transformers does not support AWQ format, and AWQ format should not launch by engine Transformers. Could you tell me how do you launch it? If you want to launch AWQ format internvl2, please use engine LMDEPLOY

danialcheung commented 5 days ago

engine Transformers does not support AWQ format, and AWQ format should not launch by engine Transformers. Could you tell me how do you launch it? If you want to launch AWQ format internvl2, please use engine LMDEPLOY

Just tried this with no success, returns error RuntimeError: Failed to launch model, detail: [address=0.0.0.0:58248, pid=3691355] Model internvl2 cannot be run on engine LMDEPLOY.

I used this command to launch: xinference launch --model-engine LMDEPLOY --model-name internvl2 --size-in-billions 76 --model-format awq --quantization Int4

qinxuye commented 5 days ago

Did you install lmdeploy?

danialcheung commented 5 days ago

Did you install lmdeploy?

Confirmed it's running after installing lmdeploy, thanks!

qinxuye commented 5 days ago

OK, close this issue then.

xorbitsai / inference