huggingface / transformers

๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.64k stars 27.16k forks source link

can't forward 4bit nllb-moe-54b (RuntimeError: result type Float can't be cast to the desired output type Byte) #26898

Closed CAH9487 closed 1 year ago

CAH9487 commented 1 year ago

System Info

GPU: NVIDIA RTX A6000 (VRAM 48G) transformers version: 4.34.0 Platform: Linux 5.15.0-69-generic Python version: 3.8.10 Huggingface_hub version: 0.18.0 Safetensors version: 0.4.0 Accelerate version: 0.23.0 PyTorch version: 2.1.0+cu118 bitsandbytes version: 0.41.1

Who can help?

No response

Information

Tasks

Reproduction

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

lang_map = {
    "ja": "jpn_Jpan",
    "zh": "zho_Hans",
}

model_path = 'facebook/nllb-moe-54b'
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
tokenizer.src_lang = lang_map["ja"]
tokenizer.tgt_lang = lang_map["zh"]

model = AutoModelForSeq2SeqLM.from_pretrained(
          model_path,
          load_in_4bit=True,
          torch_dtype=torch.float16,
          device_map="auto",
        )

forced_bos_token_id = tokenizer.lang_code_to_id[lang_map["zh"]]
model.config.forced_bos_token_id = forced_bos_token_id

generation_config = dict(
    repetition_penalty=1.2,
    do_sample=False,
    num_beams=5,
    num_return_sequences=1,
    max_new_tokens=512,
    pad_token_id=tokenizer.eos_token_id,
)

input_text = '็ฑณ่ฒกๅ‹™็œใฏ12ๆ—ฅใ€้€ฃ้‚ฆๆ”ฟๅบœๅ‚ตๅ‹™ไธŠ้™ใฎๅˆฐ้”ๅพŒใ‚‚ๆ”ฏๆ‰•ใ„ๅฑฅ่กŒใชใฉ่ณ‡้‡‘ใ‚’ใ‚„ใ‚Šใใ‚Šใ—ใฆใใŸ็‰นๅˆฅๆŽช็ฝฎใซใคใ„ใฆใ€ไปŠๆœˆ10ๆ—ฅๆ™‚็‚นใงใ‚ใจ880ๅ„„ใƒ‰ใƒซ๏ผˆ็ด„11ๅ…†9400ๅ„„ๅ††๏ผ‰ใ—ใ‹ๆฎ‹ใ•ใ‚Œใฆใ„ใชใ„ใ“ใจใ‚’ๆ˜Žใ‚‰ใ‹ใซใ—ใŸใ€‚'

encodings = tokenizer(input_text, truncation=True, max_length=512, return_tensors="pt").to('cuda')

with torch.no_grad():
    outputs = model.generate(input_ids=encodings["input_ids"], **generation_config)

preds = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(preds)

error message:

Traceback (most recent call last):
  File "t.py", line 39, in <module>
    outputs = self._model.generate(input_ids=encodings["input_ids"], **generation_config)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1496, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 661, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/nllb_moe/modeling_nllb_moe.py", line 1170, in forward
    layer_outputs = encoder_layer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/nllb_moe/modeling_nllb_moe.py", line 702, in forward
    hidden_states, router_states = self.ffn(hidden_states, attention_mask)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/nllb_moe/modeling_nllb_moe.py", line 484, in forward
    expert_output *= 1 - self.moe_token_dropout
RuntimeError: result type Float can't be cast to the desired output type Byte

Expected behavior

translated text.

ArthurZucker commented 1 year ago

Hey! I think I would set the moe_token_dropout to 0 as a quick fix. Otherwise not sure why but the dtype is wrong cc @younesbelkada if you know of a quick fix on the modelling code?