BUG：UI端关闭模型，资源没释放 - Githubissues

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

https://inference.readthedocs.io

Apache License 2.0

4.99k stars 396 forks source link

BUG：UI端关闭模型，资源没释放 #1682

Closed leoterry-ulrica closed 1 month ago

leoterry-ulrica commented 3 months ago

Describe the bug

通过UI端点击关闭模型，但显存资源没有释放，或者释放不干净的问题。

To Reproduce

To help us to reproduce this bug, please provide information below:

Your Python version. Python-3.10
The version of xinference you use. 0.12.0
vllm version 0.4.3

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

LLM：qwen2-72b-4bit/8bit

ChengjieLi28 commented 3 months ago

@leoterry-ulrica 4卡启动的qwen2-72b-4bit/8bit？启动成功后running models页面什么样发下截图。然后点了ternimate按钮之后的日志是什么样，也发下

leoterry-ulrica commented 3 months ago

@leoterry-ulrica 4卡启动的qwen2-72b-4bit/8bit？启动成功后running models页面什么样发下截图。然后点了ternimate按钮之后的日志是什么样，也发下

启动成功后running models
点了terminate之后日志

资源占用情况（看资源应该只释放其中一个卡的占用）：

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.

leoterry-ulrica commented 1 month ago

@ChengjieLi28 目前在4张L20和8张H800都能复现这个问题。

ChengjieLi28 commented 1 month ago

@ChengjieLi28 目前在4张L20和8张H800都能复现这个问题。

首先我这边没出现多卡无法释放资源的问题。先建议升级xinference和vllm。然后建议如下测试：

不管xinference，只用vllm接口去4卡加载你的模型，然后使用一下再杀掉，看看他本身有没有问题。
杀掉指直接干掉进程。
多试几个vllm版本，如果你可以的话。

xinference没有额外做什么管理显存的事情，vllm加载后就完全交给vllm。

leoterry-ulrica commented 1 month ago

新版本v-0.14.1测试通过，已解决这个问题，点赞！ @qinxuye @ChengjieLi28