Open sajastu opened 2 weeks ago
Hi @sajastu
I looked at the traceback of the issue as well as the code on the Hub, can you also add Phi3DecoderLayer
in no_split_modules
? The error seemed to happen here: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct/blob/main/modeling_phi3.py#L899
Hey @younesbelkada, I added the Phi3DecoderLayer
to the no_split_modules
argument array, still getting kind of the same error, apparently on a different spot:
flash-attention
package not found, consider installing for better performance: /home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/flash_attn_2_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv. Currentflash-attenton
does not supportwindow_size
. Either upgrade or useattn_implementation='eager'
. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You are not running the flash-attention implementation, expect numerical differences. Traceback (most recent call last): File "test.py", line 55, inoutputs = model.generate(batch['input_ids'][0], max_new_tokens=50) File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/transformers/generation/utils.py", line 1758, in generate result = self._sample( File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/transformers/generation/utils.py", line 2397, in _sample outputs = self( File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/disk1/sasha/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-medium-4k-instruct/d194e4e74ffad5a5e193e26af25bcfc80c7f1ffc/modeling_phi3.py", line 1286, in forward outputs = self.model( File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/disk1/sasha/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-medium-4k-instruct/d194e4e74ffad5a5e193e26af25bcfc80c7f1ffc/modeling_phi3.py", line 1164, in forward layer_outputs = decoder_layer( File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/disk1/sasha/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-medium-4k-instruct/d194e4e74ffad5a5e193e26af25bcfc80c7f1ffc/modeling_phi3.py", line 885, in forward attn_outputs, self_attn_weights, present_key_value = self.self_attn( File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, **kwargs) File "/disk1/sasha/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-medium-4k-instruct/d194e4e74ffad5a5e193e26af25bcfc80c7f1ffc/modeling_phi3.py", line 383, in forward key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs) File "/home/sasha/anaconda3/envs/myenv-py38/lib/python3.8/site-packages/transformers/cache_utils.py", line 155, in update self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_cat)
System Info
transformers
version: 4.41.2Who can help?
@SunMarc , @ArthurZucker , @younesbelkada and @muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm trying to run Big Model Inference with HF's accelerate package with the following code (in multi-GPU) setting, but keep getting cuda-related error attached below. Here's the code:
Code:
Error (on line
model.generate(batch['input_ids'][0].to(device), max_new_tokens=50)
):Expected behavior
Generation of output text, from the Big model without any cuda-related error!