Open zlht812 opened 1 year ago
这是因为量化版会把query_key_value改掉,封装;而lora又会改回来。这样input是float16, weight是int8,不能运算。不知有人成功解救这问题的吗?
换了另外的训练脚本和f16权重,可以了。原版的训练脚本不支持int 8
@zlht812 换了训练版本可以支持int8量化版模型了?换了f16权重,这里能说下是怎么做的吗?方便加下vx,交流下?我的是229402265
我也想问知道这个问题的答案,能发下吗?
用的这个支持int8的lora:https://github.com/ssbuild/chatglm_finetuning 目前的情况是,lora训练完成,推理时,预训练模式使用f16,但使用int8方式加载。lora使用half()成功加载,但推理时,又报同样的错误。
推理用的预训练模型和训练lora用的一样,怀疑是lora()本身就是int8,所以去掉half(),结果显卡挂了:CUDA out of memory 等新服务器上线后,再测试下。
有什么解决方案吗
Traceback (most recent call last): File "/data/ChatGLM-Tuning/finetune.py", line 117, in
main()
File "/data/ChatGLM-Tuning/finetune.py", line 110, in main
trainer.train()
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/transformers/trainer.py", line 2699, in training_step
loss = self.compute_loss(model, inputs)
File "/data/ChatGLM-Tuning/finetune.py", line 54, in compute_loss
return model(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1190, in forward
transformer_outputs = self.transformer(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 985, in forward
layer_ret = torch.utils.checkpoint.checkpoint(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(args)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 627, in forward
attention_outputs = self.attention(
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 445, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/anaconda3/envs/aigpu310/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: self and mat2 must have the same dtype
请教~