Open AmitRozner opened 5 months ago
I also encountered this bug running 13B on two 40G A100. It only work with running 34B model on 2 40G A100. Still havn't figure out yet.
Hi, I encountered the similar problem, and found following method can solve it. (1) Updating transformers to the last version from source code to get version 4.42.0.dev0 or higher: pip install git+https://github.com/huggingface/transformers (2) Adding attributution _supports_cache_class = True to class PllavaPreTrainedModel.
gaowei724
Thank you so much! Your comment saved me from 2 days of debugging. Out of curiosity - what's the related issue? I want to understand why your fix worked.
Thanks a lot!
I tried to run the demo on multiple RTX 3090 but got strange errors:
This happens for both 13b and 34b models which do not fit a single GPU. I tried to use
tie_weights()
and move thelanguage_model.base_model.model.lm_head
to the same device as the vision embedding but both methods did not work. Any thoughts?