mymusise / ChatGLM-Tuning

基于ChatGLM-6B + LoRA的Fintune方案
MIT License
3.74k stars 442 forks source link

int8量化版本finetuning报错:RuntimeError: self and mat2 must have the same dtype #214

Open zlht812 opened 1 year ago

zlht812 commented 1 year ago

Traceback (most recent call last): File "/data/ChatGLM-Tuning/finetune.py", line 117, in main() File "/data/ChatGLM-Tuning/finetune.py", line 110, in main trainer.train() File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train return inner_training_loop( File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 2699, in training_step loss = self.compute_loss(model, inputs) File "/data/ChatGLM-Tuning/finetune.py", line 54, in compute_loss return model( File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1190, in forward transformer_outputs = self.transformer( File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 985, in forward layer_ret = torch.utils.checkpoint.checkpoint( File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 627, in forward attention_outputs = self.attention( File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 445, in forward mixed_raw_layer = self.query_key_value(hidden_states) File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: self and mat2 must have the same dtype 请教~

calvinzhan commented 1 year ago

这是因为量化版会把query_key_value改掉,封装;而lora又会改回来。这样input是float16, weight是int8,不能运算。不知有人成功解救这问题的吗?

zlht812 commented 1 year ago

换了另外的训练脚本和f16权重,可以了。原版的训练脚本不支持int 8

calvinzhan commented 1 year ago

@zlht812 换了训练版本可以支持int8量化版模型了?换了f16权重,这里能说下是怎么做的吗?方便加下vx,交流下?我的是229402265

songyi1999 commented 1 year ago

我也想问知道这个问题的答案,能发下吗?

zlht812 commented 1 year ago

用的这个支持int8的lora:https://github.com/ssbuild/chatglm_finetuning 目前的情况是,lora训练完成,推理时,预训练模式使用f16,但使用int8方式加载。lora使用half()成功加载,但推理时,又报同样的错误。

image

推理用的预训练模型和训练lora用的一样,怀疑是lora()本身就是int8,所以去掉half(),结果显卡挂了:CUDA out of memory 等新服务器上线后,再测试下。

Xzaohui commented 1 year ago

有什么解决方案吗