matatonic / openedai-vision

An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
GNU Affero General Public License v3.0
157 stars 12 forks source link

Use Multiple GPUs with InternVL2 #12

Open Backendmagier opened 1 month ago

Backendmagier commented 1 month ago

Is it possible to use multiple GPUs? im having 2 3090s but i cant get InternVL26B to run as it is always running on only one Card...

I try to start it like this: "CUDA_VISIBLE_DEVICES=1,0 python vision.py --model OpenGVLab/InternVL2-26B --device-map sequential"

and i get this error: "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 1 has a total capacity of 23.55 GiB of which 9.19 MiB is free. Including non-PyTorch memory, this process has 23.52 GiB memory in use. Of the allocated memory 23.02 GiB is allocated by PyTorch, and 125.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"

matatonic commented 1 month ago

Not yet, but there has been an update by the model maker indicating how to do this, I will update it in an upcoming release.

Backendmagier commented 1 month ago

alright, thank you!

matatonic commented 1 month ago

(I'll close the issue when it's fixed)