RWKV model load int4 model fail

juan-OY commented 6 months ago

Linux OS 22.04

convert RWKV model to INT4 model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, optimize_model=True, trust_remote_code=True) model = model.to('xpu') model = BenchmarkWrapper(model, do_print=True)

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

save_path = "./rwkv-4-world-7b-int4/" model.save_low_bit(save_path) tokenizer.save_pretrained(save_path) print(f"Model and tokenizer are saved to {save_path}")
load the converted int4 model, failed with below error: (RWKV-py310) a770@RPLP-A770:~/ouyang/rwkv/models$ python generate_rwkv4_7b.py /home/a770/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functio, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpegorlibpnginstalled before buildingtorchvis warn( 2024-02-19 11:21:33,665 - INFO - intel_extension_for_pytorch auto imported **** loading rwkv-4-world-7b-int4 2024-02-19 11:21:33,731 - INFO - Converting the current model to sym_int4 format...... <class 'transformers.models.rwkv.modeling_rwkv.RwkvForCausalLM'> Can not read the prompt file, please check the file path. 2024-02-19 11:21:36,422 - WARNING - The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'n reliable results. 2024-02-19 11:21:36,422 - WARNING - Setting pad_token_id to eos_token_id:0 for open-end generation. Traceback (most recent call last): File "/home/a770/ouyang/rwkv/models/generate_rwkv4_7b.py", line 91, in output = model.generate(input_ids, File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 1563, in generate return self.greedy_search( File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 2385, in greedy_search outputs = self( File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 533, in call return self.model(*args, kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/transformers/models/rwkv/modeling_rwkv.py", line 791, in forward rwkv_outputs = self.rwkv( File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/transformers/models/rwkv/modeling_rwkv.py", line 642, in forward self._rescale_layers() File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/transformers/models/rwkv/modeling_rwkv.py", line 721, in _rescalelayers block.attention.output.weight.div(2 int(block_id // self.config.rescale_every)) RuntimeError: result type Float can't be cast to the desired output type Byte

juan-OY commented 6 months ago

Also found that 2.5.0b20240213 rwkv model loading at runtime is much slower than 2.5.0b20240204 about 4 min with 2.5.0b20240213, and 1 min with 2.5.0b20240204

leonardozcm commented 6 months ago

LinuxOS 22 The loading failed issue has been fixed in the attached pr.

Also found that 2.5.0b20240213 rwkv model loading at runtime is much slower than 2.5.0b20240204 about 4 min with 2.5.0b20240213, and 1 min with 2.5.0b20240204

Cann't reproduce this. My bigdl version is 2.5.0b20240218. On my desktop, for load_low_bit it only takes 1.5s, and for from_pretrained is takes 10.26s. And the time remains the same when I downgrade bigdl-llm to 2.5.0b20240204.

leonardozcm commented 6 months ago

Will fix in https://github.com/intel-analytics/BigDL/pull/10179

juan-OY commented 6 months ago

rwkv5 issue still exist , bigdl version: 2.5.0b20240221

2024-02-21 22:17:22,445 - INFO - Converting the current model to sym_int4 format...... <class 'transformers_modules.modeling_rwkv5.Rwkv5ForCausalLM'> Can not read the prompt file, please check the file path. Traceback (most recent call last): File "/home/a770/ouyang/rwkv/models/generate_rwkv5.py", line 96, in output = model.generate(input_ids, File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 1613, in generate return self.sample( File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 2697, in sample outputs = self( File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 533, in call return self.model(*args, *kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 820, in forward rwkv_outputs = self.rwkv( File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 708, in forward hidden_states, state, attentions = block( File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 417, in forward attention, state = self.attention(self.ln1(hidden), state=state, use_cache=use_cache, seq_mode=seq_mode) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 331, in forward rwkv, layer_state = rwkv_linear_attention( File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 232, in rwkv_linear_attention return rwkv_linear_attention_v5_cpu( File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 204, in rwkv_linear_attention_v5_cpu out = out @ ow RuntimeError: mat1 and mat2 shapes cannot be multiplied (50x4096 and 8912896x1)

juan-OY commented 6 months ago

Below error on RWKV5 is fixed in latest release 2.5.0b20240221 out = out @ ow RuntimeError: mat1 and mat2 shapes cannot be multiplied (50x4096 and 8912896x1)

The correct way to load is as below: model = AutoModelForCausalLM.load_low_bit(model_path, trust_remote_code=True, optimize_model=True) It failed if optimize_model=False

juan-OY commented 3 months ago

Resolved already.

intel-analytics / ipex-llm

RWKV model load int4 model fail #10161

Load tokenizer