Closed osafaimal closed 5 months ago
Hey @osafaimal, great to see you back! 😊 Hope everything's going well on your end. I'm diving into this intriguing issue you've encountered and will get back to you with a detailed response shortly. Hang tight!
Shouldn't this issue be opened on vllm?
i don t know if it's fully vllm problem or it s how llamaindex use vllm?
Based on the information you've provided and the context from the LlamaIndex repository, it seems like the AssertionError "tensor model parallel group is not initialized" you're encountering when running your LLM model using LlamaIndex v0.10.23 is likely due to the model's tensor parallelism (TP) configuration not being properly initialized or matched with the runtime environment's configuration. This error typically arises when there's a mismatch between the expected and actual configurations for distributed model parallelism, particularly in environments utilizing NVIDIA's TensorRT for optimized inference.
Here are some steps you can take to resolve this issue:
Correct Installation and Configuration: Verify that tensorrt_llm
and its dependencies are correctly installed and configured in your environment. The error message suggests ensuring the installation with a specific command, which might be necessary if the environment has changed or if the initial setup was incomplete.
Matching World Sizes: Ensure that the world size defined by your model's configuration (the product of tp_size
and pp_size
) matches the runtime environment's world size (tensorrt_llm.mpi_world_size()
). This might involve adjusting your model's configuration or the environment setup to ensure consistency.
Environment Support: The assertion pp_size == 1
indicates that pipeline parallelism is not supported in the Python runtime you're using. If your model or environment is configured to use pipeline parallelism, this could be a source of conflict. Ensure that your setup is compatible with the limitations of the Python runtime regarding pipeline parallelism.
GPU Availability: The assertion torch.cuda.is_available()
checks for CUDA-enabled GPU availability. Ensure that your runtime environment has access to a compatible NVIDIA GPU, as the TensorRT-LLM model requires it for operation.
MPI Initialization: The error might also stem from an improperly initialized MPI (Message Passing Interface) environment, which is crucial for distributed computing scenarios like model parallelism. Ensure that your MPI environment is correctly initialized before running the model.
If after checking these points the issue persists, it might be helpful to consult the documentation or support resources for tensorrt_llm
, LlamaIndex, or the specific hardware and software environment you're using to identify any additional configuration or compatibility requirements.
Regarding the functions you mentioned, unfortunately, I couldn't find specific information about get_tensor_model_parallel_group()
, tensor_model_parallel_all_reduce()
, and execute_model()
within the LlamaIndex repository. However, the acomplete()
function in the base.py
file in the LlamaIndex codebase is used to asynchronously complete a given prompt. This function takes a prompt as input, optionally formats it if the formatted
argument is set to False
, and then uses the superclass's acomplete()
method to asynchronously generate a completion response based on the given prompt. This is particularly useful for integrating with asynchronous codebases or handling operations that benefit from non-blocking execution. You can find more details about this function in the source code.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
@osafaimal llama-index is just doing
I don't know enough about vLLM to know if thats missing anything. It might be model/env specific?
I don't know enough about vLLM to know if thats missing anything. It might be model/env specific?
it s not model specific because i tested with multiple models and same result
i can't reproduce my problem anymore. i don't know what solved it
Bug Description
when i run the code below twice i have the error:
tensor model parallel group is not initialized
. the problem seem to come from vllm but i don't understand where precisely come the problem.Version
0.10.23
Steps to Reproduce
you run the code twice
Relevant Logs/Tracbacks