[Question]: Spreading Models Across Multiple GPUs

JoshuaFurman commented 8 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I was working on embedding a fairly large set of text files and was wondering if there was a way in llama-index to use multiple GPUs for doing this? I have a machine with 2xA100 so I want to use all the power at my disposal. I couldn't find anything in the docs that were referencing this...

Thank you!

dosubot[bot] commented 8 months ago

🤖

Hey there, @JoshuaFurman! 🚀 Great to see you diving into another adventure with LlamaIndex. How's the coding journey been treating you since our last chat?

Based on the current version of LlamaIndex (v0.9.41), there is no support for multi-GPU processing. This is evident in the codebase, specifically in the file nvidia_tensorrt.py, where an assertion is made to ensure that the pipeline parallelism size (pp_size) is always 1. This implies that only one GPU is used for processing.

Here is the relevant code snippet:

assert (
    pp_size == 1
), "Python runtime does not support pipeline parallelism"

If pp_size were greater than 1, it would imply the use of multiple GPUs, but this is not supported in the current version.

Unfortunately, I couldn't find any information about plans to support multi-GPU processing in future versions of LlamaIndex. I recommend keeping an eye on the repository for any updates or changes in future versions.

I hope this answers your question. If you have any other questions or need further clarification, feel free to ask.

-Dosu

Sources

llama_index/llms/nvidia_tensorrt.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

teis-e commented 6 months ago

@dosu-bot Is this supported in the latest version?

run-llama / llama_index