vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.98k stars 4.71k forks source link

[Usage]: How to reload model when tensor_parallel_size > 1 ? #4938

Closed qy1026 closed 1 day ago

qy1026 commented 6 months ago

My Python Script

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
import time
import torch
from vllm import LLM, SamplingParams
import gc
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

model_name1 = "./Qwen1.5-7B-Chat/"
llm1 = LLM(model=model_name1, tensor_parallel_size=2)
print("model loaded !")

destroy_model_parallel()
del llm1
gc.collect()
torch.cuda.empty_cache()
print("model deleted !")

model_name2 = "./Qwen1.5-14B-Chat/"
llm2 = LLM(model=model_name2, tensor_parallel_size=2)
print("model reloaded !")

How would you like to use vllm

When tensor_parallel_size=1, the program worked well. But when tensor_parallel_size=2, it got stuck with 2024-05-21 16:59:38,442 INFO worker.py:1582 -- Calling ray.init() again after it has already been called. after "model deleted !"

vincent-pli commented 6 months ago

when tensor_parallel_size > 1, the vllm will employ ray to run the model, in you case, it will cost 2 GPU. I guess when you destroy llm1 the placement group in ray are not removed, so the 2 GPU were not released. then the llm2 will pending since ray cannot get enough sources(2 GPU) for placement group.

qy1026 commented 6 months ago

when tensor_parallel_size > 1, the vllm will employ ray to run the model, in you case, it will cost 2 GPU. I guess when you destroy llm1 the placement group in ray are not removed, so the 2 GPU were not released. then the llm2 will pending since ray cannot get enough sources(2 GPU) for placement group.

Really thank you for your reply. Do you have any ideas for this question?

vincent-pli commented 6 months ago

The simplest way is to stop ray cluster after destroy llm1 and llm2 will start a fresh ray cluster

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] commented 1 day ago

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!