Traceback (most recent call last):
File "t.py", line 39, in <module>
outputs = self._model.generate(input_ids=encodings["input_ids"], **generation_config)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1496, in generate
model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 661, in _prepare_encoder_decoder_kwargs_for_generation
model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/nllb_moe/modeling_nllb_moe.py", line 1170, in forward
layer_outputs = encoder_layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/nllb_moe/modeling_nllb_moe.py", line 702, in forward
hidden_states, router_states = self.ffn(hidden_states, attention_mask)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/nllb_moe/modeling_nllb_moe.py", line 484, in forward
expert_output *= 1 - self.moe_token_dropout
RuntimeError: result type Float can't be cast to the desired output type Byte
Hey! I think I would set the moe_token_dropout to 0 as a quick fix. Otherwise not sure why but the dtype is wrong cc @younesbelkada if you know of a quick fix on the modelling code?
System Info
GPU: NVIDIA RTX A6000 (VRAM 48G) transformers version: 4.34.0 Platform: Linux 5.15.0-69-generic Python version: 3.8.10 Huggingface_hub version: 0.18.0 Safetensors version: 0.4.0 Accelerate version: 0.23.0 PyTorch version: 2.1.0+cu118 bitsandbytes version: 0.41.1
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
error message:
Expected behavior
translated text.