Open osafaimal opened 5 months ago
Try calling torch.cuda.empty_cache()
after you delete the LLM
object
You can also use gc.collect()
to remove *garbage* objects immediately, after you delete them.
both doesn't work.
You should also clean Notebook output: https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code
i always do (In the GUI not in my cells)
this seems mostly solved by #1908 with
import gc
import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel
# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)
# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")
this seems mostly solved by #1908 with
import gc import torch from vllm import LLM, SamplingParams from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel # Load the model via vLLM llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70) # Delete the llm object and free the memory destroy_model_parallel() del llm.llm_engine.driver_worker del llm gc.collect() torch.cuda.empty_cache() torch.distributed.destroy_process_group() print("Successfully delete the llm pipeline and free the GPU memory!")
i had already read that. My problem stay unsolved when i use the Vllm from llamaindex otherwise it almost works. I've a little of memory that stay used (~1GB) but at least i can load and unload the models. the problem is that i don't find how access to the member llm_engine of Vllm.LLM
Hi, i m sorry, i don't find how unload model. like i load a model, i delete the object and i call the garbage collector but it does nothing. How we are suppose to unload model? I want to load a model do a batch, load an other do a batch, like that for multiple models for comparing them. But for now i must stop python each time.