RuntimeError: result type Float can't be cast to the desired output type Char

System Info

My System Info:

transformers version: 4.30.0.dev0
Platform: Linux-5.15.107+-x86_64-with-glibc2.31
Python version: 3.10.11
Huggingface_hub version: 0.14.1
Safetensors version: not installed
PyTorch version (GPU?): 2.0.1+cu118 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.6.9 (gpu)
Jax version: 0.4.8
JaxLib version: 0.4.7
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I Ran The Official Code Example:

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

model_id = "RWKV/rwkv-raven-1b5"

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.eval()
if torch.__version__ >= "2":
    torch.compile(model)
generation_config = GenerationConfig(max_new_tokens=1000, temperature=0.7, top_k=35, top_p=0.90, pad_token_id= tokenizer.eos_token_id)
question = "Write me a Poem About NLP"
prompt = f"### Instruction: {question}\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate((inputs["input_ids"]), generation_config=generation_config)
print(output)

It Works Fine!

I Ran the same code with some additional args in from_pretrained() func when initialising the model:

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

model_id = "RWKV/rwkv-raven-1b5"

model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True, load_in_8bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.eval()
if torch.__version__ >= "2":
    torch.compile(model)
generation_config = GenerationConfig(max_new_tokens=1000, temperature=0.7, top_k=35, top_p=0.90, pad_token_id= tokenizer.eos_token_id)
question = "Tell me How RWKV RNNs are Parallelizable"
prompt = f"### Instruction: {question}\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate((inputs["input_ids"]), generation_config=generation_config)
print(output)

But When I Ran This Code, I Got The Following Error:

/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1448: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 7>:7                                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:115 in decorate_context       │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1518 in generate        │
│                                                                                                  │
│   1515 │   │   │   │   )                                                                         │
│   1516 │   │   │                                                                                 │
│   1517 │   │   │   # 11. run greedy search                                                       │
│ ❱ 1518 │   │   │   return self.greedy_search(                                                    │
│   1519 │   │   │   │   input_ids,                                                                │
│   1520 │   │   │   │   logits_processor=logits_processor,                                        │
│   1521 │   │   │   │   stopping_criteria=stopping_criteria,                                      │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2335 in greedy_search   │
│                                                                                                  │
│   2332 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2333 │   │   │                                                                                 │
│   2334 │   │   │   # forward pass to get next token                                              │
│ ❱ 2335 │   │   │   outputs = self(                                                               │
│   2336 │   │   │   │   **model_inputs,                                                           │
│   2337 │   │   │   │   return_dict=True,                                                         │
│   2338 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl            │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:165 in new_forward                   │
│                                                                                                  │
│   162 │   │   │   with torch.no_grad():                                                          │
│   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
│   164 │   │   else:                                                                              │
│ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
│   166 │   │   return module._hf_hook.post_forward(module, output)                                │
│   167 │                                                                                          │
│   168 │   module.forward = new_forward                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/rwkv/modeling_rwkv.py:780 in forward │
│                                                                                                  │
│   777 │   │   """                                                                                │
│   778 │   │   return_dict = return_dict if return_dict is not None else self.config.use_return   │
│   779 │   │                                                                                      │
│ ❱ 780 │   │   rwkv_outputs = self.rwkv(                                                          │
│   781 │   │   │   input_ids,                                                                     │
│   782 │   │   │   inputs_embeds=inputs_embeds,                                                   │
│   783 │   │   │   state=state,                                                                   │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl            │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:165 in new_forward                   │
│                                                                                                  │
│   162 │   │   │   with torch.no_grad():                                                          │
│   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
│   164 │   │   else:                                                                              │
│ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
│   166 │   │   return module._hf_hook.post_forward(module, output)                                │
│   167 │                                                                                          │
│   168 │   module.forward = new_forward                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/rwkv/modeling_rwkv.py:645 in forward │
│                                                                                                  │
│   642 │   │   return_dict = return_dict if return_dict is not None else self.config.use_return   │
│   643 │   │                                                                                      │
│   644 │   │   if self.training == self.layers_are_rescaled:                                      │
│ ❱ 645 │   │   │   self._rescale_layers()                                                         │
│   646 │   │                                                                                      │
│   647 │   │   if input_ids is not None and inputs_embeds is not None:                            │
│   648 │   │   │   raise ValueError("You cannot specify both input_ids and inputs_embeds at the   │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/rwkv/modeling_rwkv.py:712 in         │
│ _rescale_layers                                                                                  │
│                                                                                                  │
│   709 │   │   │   │   │   │   block.attention.output.weight.mul_(2 ** int(block_id // self.con   │
│   710 │   │   │   │   │   │   block.feed_forward.value.weight.mul_(2 ** int(block_id // self.c   │
│   711 │   │   │   │   │   else:                                                                  │
│ ❱ 712 │   │   │   │   │   │   block.attention.output.weight.div_(2 ** int(block_id // self.con   │
│   713 │   │   │   │   │   │   block.feed_forward.value.weight.div_(2 ** int(block_id // self.c   │
│   714 │   │                                                                                      │
│   715 │   │   self.layers_are_rescaled = not self.training                                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: result type Float can't be cast to the desired output type Char

I Tried So Many Ways to Address This, But Nothing Works.

But When I Run This Model Initializing code: model = AutoModelForCausalLM.from_pretrained(model_id) ...without loading it in 8bits, and other args. it Works Fine.

So i guess There Should be Bug in rwkv modelling Code Which Prevents Generating Output, when loaded in 8bit and with some args(You Can See it in Above code snippets).

Correct Me If I were Wrong or Please fix it ASAP.

Who Can Help? @ArthurZucker @gante @sgugger

Expected behavior

I Expected it Generate Text as it Generate Before!

huggingface / transformers