Closed anton-l closed 6 days ago
Thanks for the PR ! I'm pretty sure that in data // mode, each process has it's own model so it should not be an issue. Did you run into an issue were the model was already deleted ? Also, I think the cleanup is only ran on the first process
@NathanHB yes, I catch an error there due to the model already being None. It only happens with vllm,data_parallel_size=2
and up, no issues if I disable data parallelism.
Ohh I did not take vllm // into account, it indeed works deifferently, good catch !
The cleanup seems to be called from multiple processes in data parallel mode, so this just ensures there's no error due to the already deleted model object.