Issue with multi-GPU inference

AmitRozner commented 5 months ago

I tried to run the demo on multiple RTX 3090 but got strange errors:

python3.10/site-packages/transformers/cache_utils.py", line 146, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

This happens for both 13b and 34b models which do not fit a single GPU. I tried to use tie_weights() and move the language_model.base_model.model.lm_head to the same device as the vision embedding but both methods did not work. Any thoughts?

ermu2001 commented 5 months ago

I also encountered this bug running 13B on two 40G A100. It only work with running 34B model on 2 40G A100. Still havn't figure out yet.

gaowei724 commented 3 months ago

Hi, I encountered the similar problem, and found following method can solve it. (1) Updating transformers to the last version from source code to get version 4.42.0.dev0 or higher: pip install git+https://github.com/huggingface/transformers (2) Adding attributution _supports_cache_class = True to class PllavaPreTrainedModel.

zhanwenchen commented 2 months ago

gaowei724

Thank you so much! Your comment saved me from 2 days of debugging. Out of curiosity - what's the related issue? I want to understand why your fix worked.

mhardik003 commented 1 month ago

Thanks a lot!

magic-research / PLLaVA

Issue with multi-GPU inference #20