Open R-C101 opened 1 month ago
Can you try the method which I've suggested in #6544?
Can you try the method which I've suggested in #6544?
Hi it runs on the example I gave you, let me introduce this method into my workflow and see if it works there too, Thank You so much for your help.
Edit: Tested on my workflow, doesnt work the same, it works normally when using llm.generate however there is one part where im doing training using the following config:
self.engine_args = EngineArgs(model=model_name, tensor_parallel_size=1, max_model_len=1024, dtype="float16")
self.engine_config = self.engine_args.create_engine_config()
self.engine_config.model_config.embedding_mode=True
distributed_init_method = get_distributed_init_method(get_ip(), get_open_port())
worker = Worker(
model_config=self.engine_config.model_config,
parallel_config=self.engine_config.parallel_config,
scheduler_config=self.engine_config.scheduler_config,
device_config=self.engine_config.device_config,
cache_config=self.engine_config.cache_config,
load_config=self.engine_config.load_config,
local_rank=0,
rank=0,
distributed_init_method=distributed_init_method,
is_driver_worker=True,
)
worker.init_device()
worker.load_model()
self.EMR = worker.model_runner
#its used like this
num_layers = self.engine_config.model_config.get_num_layers(self.engine_config.parallel_config)
hs=self.EMR.execute_model(seqs, kv_caches=[None] * num_layers).to("cuda:3")
hs=hs.reshape([self.num_generate, -1, input_embeds.shape[-1]])
logits=output_embed_layer(hs.to(model.dtype))
Using the same method provided doesn't work in this case, what else would I have to quit/destroy here?
Can you try the method which I've suggested in #6544?
It does not work for me.π
Can you try the method which I've suggested in #6544?
It does not work for me.π
Please show the code which you've used.
Your current environment
π Describe the bug
Unloading a model from memory doesn't work with the solutions provided. Please advice on how to use 2 models back to back. Also worthy to note that according to (https://github.com/vllm-project/vllm/issues/1908#issuecomment-2101122008) subsequent models do get terminated however the first one still remains.
Traceback: