Open SamKG opened 2 weeks ago
will take a look later
sorry i don't get it. the usage of oot model registration, is that you register the architecture name appearing in the huggingface config file, not the LLM
argument.
see https://huggingface.co/facebook/opt-125m/blob/main/config.json#L6 for example.
sorry i don't get it. the usage of oot model registration, is that you register the architecture name appearing in the huggingface config file, not the
LLM
argument.see https://huggingface.co/facebook/opt-125m/blob/main/config.json#L6 for example.
Yes, this is how I am using it. For context, the "SomeModel/" directory here contains a config.json file which references my custom architecture. For clarity, can use this example:
from vllm import ModelRegistry
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
from vllm import LLM, SamplingParams
if __name__ == "__main__":
llm = LLM(
model="path_to_directory/", # directory which has a config.json with architectures: ["SomeModel"]
tensor_parallel_size=8,
# distributed_executor_backend="ray", # ray backend fails!
)
then it makes sense to me. ray
workers does not know "SomeModel", the following code:
from vllm import ModelRegistry
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
is not executed in ray workers.
thanks! is there a way to do this initialization on the ray workers?
@SamKG so the default backend (multiprocessing) should work out-of-the-box, right?
also cc @rkooo567 - maybe this is solvable via runtime env
@SamKG is there a full repro somewhere we can look at?
@SamKG is there a full repro somewhere we can look at?
@richardliaw Try attached. Note that the default backend will also fail (but with an expected error), since I added a stub tensor to keep the model directory small.
@youkaichao yes, default backend works fine (as long as the OOT definition happens outside of main)
ray.init(runtime_env={"worker_process_setup_hook": })... allows to execute code on all workers. Would this suffice?
@rkooo567 this functionality seems related, but how can we expose it to users?
ray.init(runtime_env={"worker_process_setup_hook": })... allows to execute code on all workers. Would this suffice?
this seems to fix the issue!
import ray
from vllm import ModelRegistry, LLM
def _init_worker():
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
_init_worker()
if __name__ == "__main__":
ray.init(runtime_env={"worker_process_setup_hook": _init_worker})
llm = LLM(
model="model/",
tensor_parallel_size=8,
distributed_executor_backend="ray",
)
llm.generate("test")
very nice!
@youkaichao maybe we can just print out a warning linking to the vllm docs about this?
and in the vllm docs let's have an example snippet like above!
Your current environment
🐛 Describe the bug
The ray distributed backend does not support out-of-tree models (on a single node).
Repro: