Closed juan-OY closed 3 months ago
Also found that 2.5.0b20240213 rwkv model loading at runtime is much slower than 2.5.0b20240204 about 4 min with 2.5.0b20240213, and 1 min with 2.5.0b20240204
LinuxOS 22 The loading failed issue has been fixed in the attached pr.
Also found that 2.5.0b20240213 rwkv model loading at runtime is much slower than 2.5.0b20240204 about 4 min with 2.5.0b20240213, and 1 min with 2.5.0b20240204
Cann't reproduce this.
My bigdl version is 2.5.0b20240218. On my desktop, for load_low_bit
it only takes 1.5s, and for from_pretrained
is takes 10.26s.
And the time remains the same when I downgrade bigdl-llm to 2.5.0b20240204.
rwkv5 issue still exist , bigdl version: 2.5.0b20240221
2024-02-21 22:17:22,445 - INFO - Converting the current model to sym_int4 format......
<class 'transformers_modules.modeling_rwkv5.Rwkv5ForCausalLM'>
Can not read the prompt file, please check the file path.
Traceback (most recent call last):
File "/home/a770/ouyang/rwkv/models/generate_rwkv5.py", line 96, in
Below error on RWKV5 is fixed in latest release 2.5.0b20240221 out = out @ ow RuntimeError: mat1 and mat2 shapes cannot be multiplied (50x4096 and 8912896x1)
The correct way to load is as below: model = AutoModelForCausalLM.load_low_bit(model_path, trust_remote_code=True, optimize_model=True) It failed if optimize_model=False
Resolved already.
Linux OS 22.04
convert RWKV model to INT4 model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, optimize_model=True, trust_remote_code=True) model = model.to('xpu') model = BenchmarkWrapper(model, do_print=True)
Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
save_path = "./rwkv-4-world-7b-int4/" model.save_low_bit(save_path) tokenizer.save_pretrained(save_path) print(f"Model and tokenizer are saved to {save_path}")
load the converted int4 model, failed with below error: (RWKV-py310) a770@RPLP-A770:~/ouyang/rwkv/models$ python generate_rwkv4_7b.py /home/a770/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functio
output = model.generate(input_ids,
File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func( args, kwargs)
File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 1563, in generate
return self.greedy_search(
File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 2385, in greedy_search
outputs = self(
File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 533, in call
return self.model(*args, kwargs)
File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/transformers/models/rwkv/modeling_rwkv.py", line 791, in forward
rwkv_outputs = self.rwkv(
File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/transformers/models/rwkv/modeling_rwkv.py", line 642, in forward
self._rescale_layers()
File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/transformers/models/rwkv/modeling_rwkv.py", line 721, in _rescalelayers
block.attention.output.weight.div(2 int(block_id // self.config.rescale_every))
RuntimeError: result type Float can't be cast to the desired output type Byte
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have
libjpegor
libpnginstalled before building
torchvis warn( 2024-02-19 11:21:33,665 - INFO - intel_extension_for_pytorch auto imported **** loading rwkv-4-world-7b-int4 2024-02-19 11:21:33,731 - INFO - Converting the current model to sym_int4 format...... <class 'transformers.models.rwkv.modeling_rwkv.RwkvForCausalLM'> Can not read the prompt file, please check the file path. 2024-02-19 11:21:36,422 - WARNING - The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'n reliable results. 2024-02-19 11:21:36,422 - WARNING - Settingpad_token_id
toeos_token_id
:0 for open-end generation. Traceback (most recent call last): File "/home/a770/ouyang/rwkv/models/generate_rwkv4_7b.py", line 91, in