matatonic / openedai-vision

An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
GNU Affero General Public License v3.0
204 stars 17 forks source link

Use Multiple GPUs with InternVL2 #12

Open Backendmagier opened 3 months ago

Backendmagier commented 3 months ago

Is it possible to use multiple GPUs? im having 2 3090s but i cant get InternVL26B to run as it is always running on only one Card...

I try to start it like this: "CUDA_VISIBLE_DEVICES=1,0 python vision.py --model OpenGVLab/InternVL2-26B --device-map sequential"

and i get this error: "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 1 has a total capacity of 23.55 GiB of which 9.19 MiB is free. Including non-PyTorch memory, this process has 23.52 GiB memory in use. Of the allocated memory 23.02 GiB is allocated by PyTorch, and 125.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"

matatonic commented 3 months ago

Not yet, but there has been an update by the model maker indicating how to do this, I will update it in an upcoming release.

Backendmagier commented 3 months ago

alright, thank you!

matatonic commented 3 months ago

(I'll close the issue when it's fixed)

AlexM4H commented 2 months ago

BTW: Hint from the model maker https://huggingface.co/OpenGVLab/InternVL2-40B/blob/main/README.md#multiple-gpus