Open petermcc0807 opened 1 year ago
Hi Hangyu,
Many thanks for your help. Based on your sample code (for Vicuna 13B), I have the following:
device = torch.device('cpu')
model, vis_processors, txt_processors = load_model_and_preprocess(name='blip2_vicuna_instruct', model_type='vicuna13b', is_eval=True, device=device)
# device_map = infer_auto_device_map(model)
# Based on my 4x16GB GPU instance
max_memory = { 0: '13GiB', 1: '13GiB', 2: '13GiB', 3: '13GiB', 'cpu': '32GiB' }
device_map = infer_auto_device_map(model, max_memory=max_memory)
model = dispatch_model(model, device_map=device_map)
torch.cuda.empty_cache()
model.eval()
if torch.__version__ >= '2' and sys.platform != 'win32':
model = torch.compile(model)
device = torch.device('cuda')
image = vis_processors['eval'](image).unsqueeze(0).to(device)
Everything works fine. However, when I perform an inference:
model.generate({ 'image': image, 'prompt': 'What is unusual about this image?' })
I get the following error:
_RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDAgather)
How can I fix this? Thanks again.
Hi there,
I'm glad to see that the issue has been resolved. As you've noticed, this new error arises from a discrepancy between the intermediate variable and the model, which are not located on the same device. This could be attributed to the accelerate library not being fully compatible with all the latest models. To tackle this kind of problem, I recommend the following suggestions:
Make sure that the parameters of the embedding matrix and the lm_head are situated on the same device. You can manually adjust the device_map based on the model.hf_device_map to achieve this.
Within the infer_auto_device_map() function, specify the no_split_module_classes=["xxx"], where "xxx" represents the name of the language model layers. For more detailed instructions, you can refer to the following link: https://huggingface.co/docs/accelerate/v0.19.0/en/usage_guides/big_modeling#limits-and-further-development.
If the error persists, I recommend following the compiler prompts and modifying the source code to ensure that the data tensor and the device model are consistent. After implementing the above two suggestions, generally, you will only need to make minimal changes to the source code.
This is great. Thanks again for your help, Hangyu.
@petermcc0807 @pygh0er Hi, may I get the code? I'm having trouble with the same issue as well It would be a great help
Hi @jun297,
Unfortunately, I could not get this to work. I tried everything, even manually shifting around device map entries - for example:
device_map = infer_auto_device_map(model)
device_map['llm_proj'] = 0 device_map['llm_model.model.norm'] = 0 device_map['llm_model.lm_head'] = 0 . . .
Nothing worked, so I gave up. I am now using LLaVA with Vicuna 13B instead. Sorry I could not be of more help.
Thanks for the reply @petermcc0807 I succeeded in LLaVA either, but it's tricky for this one. I failed as well.
Hi! this is my previous code for loading blip, and it works fine for me. You can give it a try.
import subprocess, re
def get_gpu_with_max_free_memory():
command = "nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits"
output = subprocess.check_output(command.split(), universal_newlines=True)
gpu_memory_info = output.strip().split("\n")
gpu_memory_info = [int(re.findall(r"\d+", info)[0]) for info in gpu_memory_info]
gpu_with_max_free_memory = gpu_memory_info.index(max(gpu_memory_info))
return gpu_with_max_free_memory
def get_blip_model(device='cuda', dtype=torch.bfloat16, use_multi_gpus=False):
cuda_number = get_gpu_with_max_free_memory()
print(f'cuda number: {cuda_number}')
model, vis_processors, txt_processors = load_model_and_preprocess(
name='blip2_vicuna_instruct',
model_type='vicuna7b',
is_eval=True,
device=f'cuda:{cuda_number}',
)
if use_multi_gpus:
device_map = infer_auto_device_map(model, max_memory={1: "3GiB", 2: "3GiB", 3: "4GiB", 4: "4GiB", 5: "4GiB"}, no_split_module_classes=['LlamaDecoderLayer', 'VisionTransformer'])
device_map['llm_model.model.embed_tokens'] = device_map['llm_model.lm_head'] = device_map['llm_proj'] = 1
print(device_map)
model = dispatch_model(model, device_map=device_map)
torch.cuda.empty_cache()
model.eval()
if torch.__version__ >= "2" and sys.platform != "win32":
model = torch.compile(model)
return model, (txt_processors, vis_processors)
@pygh0er Thank you!
@pygh0er I was not able to load it in a 2x16GB GPUs. What was your setup?
@joskfg Hi, you can use the following code:
def get_blip_model(device='cuda', dtype=torch.bfloat16, use_multi_gpus=True):
model, vis_processors, txt_processors = load_model_and_preprocess(
name='blip2_vicuna_instruct',
model_type='vicuna7b',
is_eval=True,
)
model.to(dtype)
if use_multi_gpus:
device_map = infer_auto_device_map(model, max_memory={0: "8GiB", 1: "9GiB"}, no_split_module_classes=['LlamaDecoderLayer', 'VisionTransformer'])
device_map['llm_model.model.embed_tokens'] = device_map['llm_model.lm_head'] = device_map['llm_proj'] = 1
print(device_map)
model = dispatch_model(model, device_map=device_map)
torch.cuda.empty_cache()
else:
model.to('cuda:0')
model.eval()
if torch.__version__ >= "2" and sys.platform != "win32":
model = torch.compile(model)
return model, (txt_processors, vis_processors)
@pygh0er Have you tried to load the InstructBLIP of vicuna13b? I encountered the problem of tensors on different devices. Could you please release the code? That will be really helpful!!! Thanks a lot!!!
My original question was about Vicuna 13B (vicuna13b), not 7B.
I've tried all the code changes suggested previously and tried a range of different device_map configurations. Although I can load the 13B model, inference fails every time with errors.
I'm trying to run this on an AWS g4dn.12xlarge instance (4 GPUs, 64GB in total).
I can run LLaVA (also Vicuna 13B) on this instance.
@LixDemon Hi,
I haven't used the 13B version of instructblip myself, but if you're facing the issue of "tensors on different devices," it's likely to be the same reason as instructblip-vicuna-7b and other LVLMs.
Here are a couple of points to consider:
I recommend checking the relevant source code, which should be in LAVIS/lavis/models/blip2_models and the modeling_lamma.py file in the transformers official repository. You need to identify which classes in the model cannot be split. For example, in instructblip, the LlamaDecoderLayer class should not be placed on different GPUs, as it may cause the issue of "tensors on different devices" due to residual modules. So, for Instructblip, I suggest setting no_split_module_classes=['LlamaDecoderLayer', 'VisionTransformer']
in the device_map
parameter like this: device_map = infer_auto_device_map(model, max_memory={0: "13GiB", 1: "14GiB"}, no_split_module_classes=['LlamaDecoderLayer', 'VisionTransformer'])
.
You should ensure that llm
's proj_layer
, embedding layer
, and lm_head
are on the same device to ensure that input and output data are on a device. Therefore, you need to set device_map['llm_model.lm_head'] = device_map['llm_proj'] = device_map['llm_model.model.embed_tokens']
.
Based on these two principles, I guess the following code should correctly load the 13B model:
def get_blip_model(device='cuda', dtype=torch.bfloat16, use_multi_gpus=True):
model, vis_processors, txt_processors = load_model_and_preprocess(
name='blip2_vicuna_instruct',
model_type='vicuna13b',
is_eval=True,
)
model.to(dtype)
if use_multi_gpus:
device_map = infer_auto_device_map(model, max_memory={0: "13GiB", 1: "14GiB"}, no_split_module_classes=['LlamaDecoderLayer', 'VisionTransformer'])
device_map['llm_model.lm_head'] = device_map['llm_proj'] = device_map['llm_model.model.embed_tokens']
print(device_map)
model = dispatch_model(model, device_map=device_map)
torch.cuda.empty_cache()
else:
model.to('cuda:0')
model.eval()
if torch.__version__ >= "2" and sys.platform != "win32":
model = torch.compile(model)
return model, (txt_processors, vis_processors)
I have managed to load the model using this code @pyogher but I am getting this error now when I try to run the model.generate: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) Is it possible some other modules need to be not split? Or does it have something to do with the input?
@saffie91 I'm in the same boat as you.
Back to a previous comment of mine, I've tried all the code changes suggested and tried a range of different device_map configurations. Although I've managed to load the Vicuna 13B model across GPUs, inference fails every time with errors (i.e. "Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!"). And I've been trying to run this on an AWS g4dn.12xlarge instance (4 GPUs, 64GB in total).
InstructBLIP is now available over at Hugging Face. Although I can get the 13B model to perform inference, it's CPU only - so, it's sloooow. When I try the GPU route, inference fails with the same old errors.
Interestingly, LLaVA (a similar multimodal which uses Vicuna 13B) works fine across multiple GPUs.
@saffie91 Hi!
There shouldn't be any other modules that cannot be split. I suggest adding BertLMHeadModel (Qformer) as additional modules that cannot be split. You can also print your device_map for us to take a look.
@pyogher Hi! Thanks for getting back to me so quickly!
I did add BertLMHeadModel as well and the results are the same.
I am using the g5.12x on aws, with 4 A10G's.
this is my device map:
{'query_tokens': 1, 'visual_encoder': 1, 'ln_vision': 1, 'Qformer': 1, 'llm_model.model.embed_tokens': 1, 'llm_model.model.layers.0': 1, 'llm_model.model.layers.1': 1, 'llm_model.model.layers.2': 1, 'llm_model.model.layers.3': 1, 'llm_model.model.layers.4': 1, 'llm_model.model.layers.5': 1, 'llm_model.model.layers.6': 1, 'llm_model.model.layers.7': 1, 'llm_model.model.layers.8': 1, 'llm_model.model.layers.9': 1, 'llm_model.model.layers.10': 1, 'llm_model.model.layers.11': 2, 'llm_model.model.layers.12': 2, 'llm_model.model.layers.13': 2, 'llm_model.model.layers.14': 2, 'llm_model.model.layers.15': 2, 'llm_model.model.layers.16': 2, 'llm_model.model.layers.17': 2, 'llm_model.model.layers.18': 2, 'llm_model.model.layers.19': 2, 'llm_model.model.layers.20': 2, 'llm_model.model.layers.21': 2, 'llm_model.model.layers.22': 2, 'llm_model.model.layers.23': 2, 'llm_model.model.layers.24': 2, 'llm_model.model.layers.25': 2, 'llm_model.model.layers.26': 2, 'llm_model.model.layers.27': 3, 'llm_model.model.layers.28': 3, 'llm_model.model.layers.29': 3, 'llm_model.model.layers.30': 3, 'llm_model.model.layers.31': 3, 'llm_model.model.layers.32': 3, 'llm_model.model.layers.33': 3, 'llm_model.model.layers.34': 3, 'llm_model.model.layers.35': 3, 'llm_model.model.layers.36': 3, 'llm_model.model.layers.37': 3, 'llm_model.model.layers.38': 3, 'llm_model.model.layers.39': 3, 'llm_model.model.norm': 3, 'llm_model.lm_head': 1, 'llm_proj': 1}
@saffie91 Hi!
I'm not sure, but I noticed that your 'llm_model.model.norm' is on GPU cuda:3. I suggest setting device_map['llm_model.lm_head'], device_map['llm_proj'], device_map['llm_model.model.norm'], and device_map['llm_model.model.embed_tokens'] to the same device. Additionally, please ensure that your input data is also on cuda:1.
If you still encounter this issue, I suggest modifying the source code based on the error. Generally, a simple modification to make the tensor causing the error on the same device may fix this bug.
@pyogher this ended up with the same error.
The image is on cuda:1 but perhaps the prompt text is not? Would that be possible?
I came across this error: We need an offload_dir
to dispatch this model according to this device_map
, the following submodules need to be offloaded: llm_model.model.embed_tokens, llm_model.model.layers.39, llm_model.model.norm, llm_model.lm_head, llm_proj.
adding offload_dir param to dispatch_model function solves the issue
@pyogher this ended up with the same error.
The image is on cuda:1 but perhaps the prompt text is not? Would that be possible?
I also have this question; how to load prompt to the same device? Also model is converted to bfloat16, do we need to process the text and image input to bfloat16 also?
Which file has load_model_and_preprocess() ?
I encountered the same problem, it should not be a problem with vicuna-13B but with the blip source code, the same 4 GPUs do the modeling in parallel, LLAVA doesn't have a problem but BLIP does.
Also encounter the same issue (using Huggingface transformer). Are there any updates on this?
Can anyone get InstructBLIP to work using transformers from Python? I would be grateful if you shared the code.
This is mine
#!/usr/bin/env python
import os
from transformers import InstructBlipProcessor, InstructBlipForConditionalGeneration
import torch
from PIL import Image
import requests
model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-7b", load_in_4bit=True, torch_dtype=torch.float16)
url = "https://raw.githubusercontent.com/salesforce/LAVIS/main/docs/_static/Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = "What is unusual about this image?"
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device="cuda", dtype=torch.float16)
outputs = model.generate(
**inputs,
num_beams=5,
max_new_tokens=256,
min_length=1,
top_p=0.9,
repetition_penalty=1.5,
length_penalty=1.0,
temperature=1,
)
outputs[outputs == 0] = 2 # this line can be removed once https://github.com/huggingface/transformers/pull/24492 is fixed
generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0].strip()
print(generated_text)
# Determine if CUDA (GPU) is available.
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the model configuration.
config = InstructBlipConfig.from_pretrained("Salesforce/instructblip-vicuna-13b")
# Initialize the model with the given configuration.
with init_empty_weights():
model = AutoModelForVision2Seq.from_config(config)
model.tie_weights()
# Infer device map based on the available resources.
device_map = infer_auto_device_map(model, max_memory={0: "30GiB", 1: "30GiB"},
no_split_module_classes=['InstructBlipEncoderLayer', 'InstructBlipQFormerLayer',
'LlamaDecoderLayer'])
device_map['language_model.lm_head'] = device_map['language_projection'] = device_map[('language_model.model'
'.embed_tokens')]
offload = ""
# Load the processor and model for image processing.
processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto")
model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b",
device_map=device_map,
offload_folder=offload, offload_state_dict=True)
NameError: name 'init_empty_weights' is not defined
The model weights are not tied. Please use the tie_weights
method before using the infer_auto_device
function.
Loading checkpoint shards: 67%|████████████████████████████████████████▋ | 4/6 [00:31<00:15, 7.88s/it]
Traceback (most recent call last):
File "C:\Users\james\test_instructblip_5.py", line 38, in
# Determine if CUDA (GPU) is available. device = "cuda" if torch.cuda.is_available() else "cpu" # Load the model configuration. config = InstructBlipConfig.from_pretrained("Salesforce/instructblip-vicuna-13b") # Initialize the model with the given configuration. with init_empty_weights(): model = AutoModelForVision2Seq.from_config(config) model.tie_weights() # Infer device map based on the available resources. device_map = infer_auto_device_map(model, max_memory={0: "30GiB", 1: "30GiB"}, no_split_module_classes=['InstructBlipEncoderLayer', 'InstructBlipQFormerLayer', 'LlamaDecoderLayer']) device_map['language_model.lm_head'] = device_map['language_projection'] = device_map[('language_model.model' '.embed_tokens')] offload = "" # Load the processor and model for image processing. processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto") model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map=device_map, offload_folder=offload, offload_state_dict=True)
NameError: name 'init_empty_weights' is not defined
The model weights are not tied. Please use the
tie_weights
method before using theinfer_auto_device
function. Loading checkpoint shards: 67%|████████████████████████████████████████▋ | 4/6 [00:31<00:15, 7.88s/it] Traceback (most recent call last): File "C:\Users\james\test_instructblip_5.py", line 38, in processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\james\blip_env\Lib\site-packages\transformers\modeling_utils.py", line 3010, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\james\blip_env\Lib\site-packages\transformers\modeling_utils.py", line 3388, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\james\blip_env\Lib\site-packages\transformers\modeling_utils.py", line 722, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "C:\Users\james\blip_env\Lib\site-packages\accelerate\utils\modeling.py", line 313, in set_module_tensor_to_device new_value = value.to(device) ^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB (GPU 1; 23.99 GiB total capacity; 22.84 GiB already allocated; 0 bytes free; 22.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF# Determine if CUDA (GPU) is available. device = "cuda" if torch.cuda.is_available() else "cpu" # Load the model configuration. config = InstructBlipConfig.from_pretrained("Salesforce/instructblip-vicuna-13b") # Initialize the model with the given configuration. with init_empty_weights(): model = AutoModelForVision2Seq.from_config(config) model.tie_weights() # Infer device map based on the available resources. device_map = infer_auto_device_map(model, max_memory={0: "30GiB", 1: "30GiB"}, no_split_module_classes=['InstructBlipEncoderLayer', 'InstructBlipQFormerLayer', 'LlamaDecoderLayer']) device_map['language_model.lm_head'] = device_map['language_projection'] = device_map[('language_model.model' '.embed_tokens')] offload = "" # Load the processor and model for image processing. processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto") model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map=device_map, offload_folder=offload, offload_state_dict=True)
you need to adjust the max memory to match the set of GPUs you have, i have two 32 GB GPU, also import the needed methods from accelerate.
you need to adjust the max memory to match the set of GPUs you have, i have two 32 GB GPU, also import the needed methods from accelerate.
I have already done that. I have 2 X RTX 4090. See my code:
device_map = infer_auto_device_map(model, max_memory={0: "24GiB", 1: "24GiB"},
no_split_module_classes=['InstructBlipEncoderLayer', 'InstructBlipQFormerLayer',
'LlamaDecoderLayer'])
I still get this error:
The model weights are not tied. Please use the tie_weights
method before using the infer_auto_device
function.
Loading checkpoint shards: 67%|████████████████████████████████████████▋ | 4/6 [00:23<00:11, 5.77s/it]
Traceback (most recent call last):
File "C:\Users\james\test_instructblip_5.py", line 39, in
you need to declare it about 5 GiB below the max, it needs space for overhead.
also print the total model size you might need to add space on your cpu to allocate the whole model. "cpu":
@AKOrojo Thank you.
So I reduced
device_map = infer_auto_device_map(model, max_memory={0: "19GiB", 1: "19GiB"}, no_split_module_classes=['InstructBlipEncoderLayer', 'InstructBlipQFormerLayer', 'LlamaDecoderLayer'])
But now I get this error:
The model weights are not tied. Please use the tie_weights
method before using the infer_auto_device
function.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████| 6/6 [00:34<00:00, 5.73s/it]
You shouldn't move a model when it is dispatched on multiple devices.
Traceback (most recent call last):
File "C:\Users\james\test_instructblip_5.py", line 43, in
Remove the line to move the model. The device map is what moves the model now.
ss the following code should correctly load the 13B model:
Same problem
Hi all This my device map {'query_tokens': 0, 'visual_encoder': 0, 'ln_vision': 0, 'Qformer': 0, 't5_model.shared': 0, 't5_model.decoder.embed_tokens': 0, 't5_model.encoder.embed_tokens': 0, 't5_model.encoder.block.0': 0, 't5_model.encoder.block.1': 0, 't5_model.encoder.block.2': 0, 't5_model.encoder.block.3': 0, 't5_model.encoder.block.4': 0, 't5_model.encoder.block.5': 0, 't5_model.encoder.block.6': 0, 't5_model.encoder.block.7': 0, 't5_model.encoder.block.8.layer.0.SelfAttention.q': 0, 't5_model.encoder.block.8.layer.0.SelfAttention.k': 1, 't5_model.encoder.block.8.layer.0.SelfAttention.v': 1, 't5_model.encoder.block.8.layer.0.SelfAttention.o': 1, 't5_model.encoder.block.8.layer.0.layer_norm': 1, 't5_model.encoder.block.8.layer.0.dropout': 1, 't5_model.encoder.block.8.layer.1': 1, 't5_model.encoder.block.9': 1, 't5_model.encoder.block.10': 1, 't5_model.encoder.block.11': 1, 't5_model.encoder.block.12': 1, 't5_model.encoder.block.13': 1, 't5_model.encoder.block.14': 1, 't5_model.encoder.block.15': 1, 't5_model.encoder.block.16': 1, 't5_model.encoder.block.17': 1, 't5_model.encoder.block.18': 1, 't5_model.encoder.block.19': 1, 't5_model.encoder.block.20': 1, 't5_model.encoder.block.21': 1, 't5_model.encoder.block.22': 1, 't5_model.encoder.block.23': 1, 't5_model.encoder.final_layer_norm': 1, 't5_model.encoder.dropout': 1, 't5_model.decoder.block.0.layer.0': 1, 't5_model.decoder.block.0.layer.1': 1, 't5_model.decoder.block.1': 2, 't5_model.decoder.block.2': 2, 't5_model.decoder.block.3': 2, 't5_model.decoder.block.4': 2, 't5_model.decoder.block.5': 2, 't5_model.decoder.block.6': 2, 't5_model.decoder.block.7': 2, 't5_model.decoder.block.8': 2, 't5_model.decoder.block.9': 2, 't5_model.decoder.block.10': 2, 't5_model.decoder.block.11': 2, 't5_model.decoder.block.12.layer.0': 2, 't5_model.decoder.block.12.layer.1': 2, 't5_model.decoder.block.12.layer.2.DenseReluDense.wi_0': 2, 't5_model.decoder.block.12.layer.2.DenseReluDense.wi_1': 2, 't5_model.decoder.block.12.layer.2.DenseReluDense.wo': 3, 't5_model.decoder.block.12.layer.2.DenseReluDense.dropout': 3, 't5_model.decoder.block.12.layer.2.DenseReluDense.act': 3, 't5_model.decoder.block.12.layer.2.layer_norm': 3, 't5_model.decoder.block.12.layer.2.dropout': 3, 't5_model.decoder.block.13': 3, 't5_model.decoder.block.14': 3, 't5_model.decoder.block.15': 3, 't5_model.decoder.block.16': 3, 't5_model.decoder.block.17': 3, 't5_model.decoder.block.18': 3, 't5_model.decoder.block.19': 3, 't5_model.decoder.block.20': 3, 't5_model.decoder.block.21': 3, 't5_model.decoder.block.22': 3, 't5_model.decoder.block.23': 3, 't5_model.decoder.final_layer_norm': 3, 't5_model.decoder.dropout': 3, 't5_model.lm_head': 3, 't5_proj': 3, 't5_model.decoder.block.0.layer.2': 2} I got this error : inputs_embeds = torch.cat([inputs_t5, inputs_embeds], dim=1) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0! (when checking argument for argument tensors in method wrapper___cat) Any idea how to solve this? Thanks
Hi,
Is it possible to load InstructBLIP (Vicuna 13B) across multiple (e.g. 4x16GB) GPUs?
LLaVA (which also uses Vicuna 13B) enables the number of GPUs to be specified. InstructBLIP's load_model_and_preprocess() doesn't appear to enable this, from what I can tell.
Thanks!