vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.67k stars 4.48k forks source link

Recovery from OOM #3066

Open Ja1Zhou opened 8 months ago

Ja1Zhou commented 8 months ago

I am instantiating an LLM class for local inference. I noticed that when an OOM error happens in vllm.LLM.llm_engine.step() and I capture it, previous requests are not aborted and would mess up with my next call to LLM.generate. I was wondering what is the proper way of recovering from OOM errors during inference?

hmellor commented 2 months ago

@Ja1Zhou did you find a solution for this?

Ja1Zhou commented 2 months ago

@Ja1Zhou did you find a solution for this?

I didn't. Had to make sure that no OOMs would occur.