xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.4k stars 438 forks source link

Qwen2.5 7b显存占用过大 #2368

Closed mengxianglong123 closed 2 weeks ago

mengxianglong123 commented 1 month ago

为什么使用xinference启动qwen2.5 7b instruct模型(直接在页面使用qwen2.5 instruct的选项启动),显存直接占了40多个g,这是为啥,显卡是a6000 48g,是上下文长度是32k导致的吗,引擎选择的是vllm

Valdanitooooo commented 1 month ago

gpu_memory_utilization默认是 0.9,可以自己调整参数

比如 max_model_len: 32768 gpu_memory_utilization: 0.8

mengxianglong123 commented 1 month ago

gpu_memory_utilization默认是 0.9,可以自己调整参数

比如 max_model_len: 32768 gpu_memory_utilization: 0.8

您好,xinference在launch模型的时候,可以指定vllm的这个参数么,在文档中没有找到

Valdanitooooo commented 1 month ago

您好,xinference在launch模型的时候,可以指定vllm的这个参数么,在文档中没有找到

可以的,文档可能不完善,--gpu_memory_utilization 0.8 这样

bigbrother666sh commented 1 month ago

您好,xinference在launch模型的时候,可以指定vllm的这个参数么,在文档中没有找到

可以的,文档可能不完善,--gpu_memory_utilization 0.8 这样

gpu_memory_utilization这个参数的具体解释是什么?

Valdanitooooo commented 1 month ago

gpu_memory_utilization这个参数的具体解释是什么?

来自 vllm 的参数 https://github.com/vllm-project/vllm/blob/8eeb85708428b7735bbd1156c81692431fd5ff34/vllm/entrypoints/llm.py#L105

bigbrother666sh commented 1 month ago

thx

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

SDAIer commented 1 month ago

请问所有参数都在这个文档中吗,没看到max_model_len

---原始邮件--- 发件人: @.> 发送时间: 2024年10月9日(周三) 上午10:18 收件人: @.>; 抄送: @.**@.>; 主题: Re: [xorbitsai/inference] Qwen2.5 7b显存占用过大 (Issue #2368)

thx

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ipc-robot commented 4 weeks ago

请问您是怎么成功部署QWEN2.5模型的?我通过UI界面启动,如何使用都会出现CUDA OUT OF MEMORY,即便是QWEN2.5-0.5B-INSTRUCT我在一台3090电脑上使用。我通过shell启动,会直接出线requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

bigbrother666sh commented 3 weeks ago

是不是显存里面本来就有其他东西,nvtop 或者 nvitop 查查

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 5 days since being marked as stale.