vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
25.36k stars 3.67k forks source link

[RFC]: proper resource cleanup for LLM class with file-like usage #5716

Open youkaichao opened 2 months ago

youkaichao commented 2 months ago

Motivation.

There have been quite a lot requests for releasing the resources taken by the LLM class, e.g. in https://github.com/vllm-project/vllm/issues/3874 and https://github.com/vllm-project/vllm/issues/4919#issuecomment-2181018405 . Currently we only have such kind of thing in the CI, introduced in https://github.com/vllm-project/vllm/pull/5357 and https://github.com/vllm-project/vllm/issues/5337 .

Proposed Change.

We can mimic the file API in python, e.g.:

with open("a.txt") as f:
    data = f.read()

LLM can support the same usage:

with LLM("facebook/opt-125m") as llm:
    output = llm.generate(prompts)

When we go out of the context manager, the resource will be released.

We can also support the legacy usage with a new LLM.release method:

llm = LLM("facebook/opt-125m")
output = llm.generate(prompts)
llm.release()

These two usages can be combined into the same implementation:

class LLM:
    ...
    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        del self.llm_engine
        self.release()

    def release(self):
        # cleanup work here

Feedback Period.

1 week

CC List.

No response

Any Other Things.

Please see https://github.com/vllm-project/vllm/issues/5337 on why del llm is not enough.

youkaichao commented 2 months ago

Note: the main usage would be that users want to sequentially create multiple LLM instances:

with LLM("model1") as llm:
    output = llm.generate(prompts)
with LLM("model2") as llm:
    output = llm.generate(prompts)
simon-mo commented 2 months ago

Can you add a section on why del llm is not sufficient?

youkaichao commented 2 months ago

Can you add a section on why del llm is not sufficient?

added

robertgshaw2-neuralmagic commented 2 months ago

Love

simon-mo commented 2 months ago

I prefer changing the explicit API from llm.release() to del llm for more pythonic API

youkaichao commented 2 months ago

I prefer changing the explicit API from llm.release() to del llm for more pythonic API

The thing is that Python del llm is notorious, it does not do what people suppose it to do :(

njhill commented 2 months ago

Couldn't a __del__ method be included with same body as __exit__?

youkaichao commented 2 months ago

The point is not __del__, but del x will not guarantee to destruct x immediately.

WoosukKwon commented 2 months ago

Thanks for the proposal! I'm good with adding this new API as long as we keep the current API as well.

I'm not sure whether we'd like to promote this over the current usage though, since the with statement is a bit restrictive from user's perspective.

youkaichao commented 2 months ago

I'm not sure whether we'd like to promote this over the current usage

we don't need to promote it. just document it and let whoever is interested use it if they want :)