运行lora的训练代码， int8=True ，推理时报错 RuntimeError: expected scalar type Half but found Float，这是什么原因？

MathamPollard commented 1 year ago

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/mdisk2/tanjunwen/gitprj/chatglm_finetuning/infer_lora_finetuning.py:49 in │ │ │ │ 46 │ else: │ │ 47 │ │ model = pl_model.get_llm_model() │ │ 48 │ │ │ │ ❱ 49 │ │ response, history = model.chat(tokenizer, "写一个诗歌，关于冬天", history=[],max │ │ 50 │ │ │ │ │ │ │ │ │ │ │ eos_token_id=config.eos_token_id, │ │ 51 │ │ │ │ │ │ │ │ │ │ │ do_sample=True, top_p=0.7, temperature=0.95, │ │ 52 │ │ print('写一个诗歌，关于冬天',' ',response) │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27 in │ │ decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/mdisk2/tanjunwen/gitprj/chatglm_finetuning/models/chatglm_model.py:100 in chat │ │ │ │ 97 │ │ │ prompt += "[Round {}]\n问：{}\n答：".format(len(history), query) │ │ 98 │ │ inputs = tokenizer([prompt], return_tensors="pt") │ │ 99 │ │ inputs = inputs.to(self.device) │ │ ❱ 100 │ │ outputs = self.generate(inputs, *gen_kwargs) │ │ 101 │ │ outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):] │ │ 102 │ │ response = tokenizer.decode(outputs) │ │ 103 │ │ response = self.process_response(response) │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27 in │ │ decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(args, kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/transformers/generation/utils.py:1 │ │ 565 in generate │ │ │ │ 1562 │ │ │ ) │ │ 1563 │ │ │ │ │ 1564 │ │ │ # 13. run sample │ │ ❱ 1565 │ │ │ return self.sample( │ │ 1566 │ │ │ │ input_ids, │ │ 1567 │ │ │ │ logits_processor=logits_processor, │ │ 1568 │ │ │ │ logits_warper=logits_warper, │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/transformers/generation/utils.py:2 │ │ 612 in sample │ │ │ │ 2609 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, model_kwargs) │ │ 2610 │ │ │ │ │ 2611 │ │ │ # forward pass to get next token │ │ ❱ 2612 │ │ │ outputs = self( │ │ 2613 │ │ │ │ model_inputs, │ │ 2614 │ │ │ │ return_dict=True, │ │ 2615 │ │ │ │ output_attentions=output_attentions, │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in │ │ _call_impl │ │ │ │ 1127 │ │ # this function, and just call forward. │ │ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1130 │ │ │ return forward_call(*input, *kwargs) │ │ 1131 │ │ # Do not call functions when jit is used │ │ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/deeptraining/nlp/models/chatglm/ │ │ _init__.py:1204 in forward │ │ │ │ 1201 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │ │ 1202 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │ │ 1203 │ │ │ │ ❱ 1204 │ │ transformer_outputs = self.transformer( │ │ 1205 │ │ │ input_ids=input_ids, │ │ 1206 │ │ │ position_ids=position_ids, │ │ 1207 │ │ │ attention_mask=attention_mask, │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in │ │ _call_impl │ │ │ │ 1127 │ │ # this function, and just call forward. │ │ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1130 │ │ │ return forward_call(input, kwargs) │ │ 1131 │ │ # Do not call functions when jit is used │ │ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/deeptraining/nlp/models/chatglm/ │ │ _init__.py:1019 in forward │ │ │ │ 1016 │ │ │ │ │ output_attentions │ │ 1017 │ │ │ │ ) │ │ 1018 │ │ │ else: │ │ ❱ 1019 │ │ │ │ layer_ret = layer( │ │ 1020 │ │ │ │ │ hidden_states, │ │ 1021 │ │ │ │ │ position_ids=position_ids, │ │ 1022 │ │ │ │ │ attention_mask=attention_mask, │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in │ │ _call_impl │ │ │ │ 1127 │ │ # this function, and just call forward. │ │ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1130 │ │ │ return forward_call(*input, *kwargs) │ │ 1131 │ │ # Do not call functions when jit is used │ │ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/deeptraining/nlp/models/chatglm/ │ │ _init__.py:626 in forward │ │ │ │ 623 │ │ │ │ 624 │ │ # Layer norm at the begining of the transformer layer. │ │ 625 │ │ # [seq_len, batch, hidden_size] │ │ ❱ 626 │ │ attention_input = self.input_layernorm(hidden_states) │ │ 627 │ │ │ │ 628 │ │ # Self attention. │ │ 629 │ │ attention_outputs = self.attention( │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 in │ │ _call_impl │ │ │ │ 1127 │ │ # this function, and just call forward. │ │ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1130 │ │ │ return forward_call(input, **kwargs) │ │ 1131 │ │ # Do not call functions when jit is used │ │ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1133 │ │ if self._backward_hooks or _global_backwardhooks: │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/nn/modules/normalization.py: │ │ 189 in forward │ │ │ │ 186 │ │ │ init.zeros(self.bias) │ │ 187 │ │ │ 188 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 189 │ │ return F.layer_norm( │ │ 190 │ │ │ input, self.normalized_shape, self.weight, self.bias, self.eps) │ │ 191 │ │ │ 192 │ def extra_repr(self) -> str: │ │ │ │ /home/mdisk2/tanjunwen/anaconda3/lib/python3.10/site-packages/torch/nn/functional.py:2503 in │ │ layer_norm │ │ │ │ 2500 │ │ return handle_torch_function( │ │ 2501 │ │ │ layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, b │ │ 2502 │ │ ) │ │ ❱ 2503 │ return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.c │ │ 2504 │ │ 2505 │ │ 2506 def group_norm( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: expected scalar type Half but found Float

ssbuild commented 1 year ago

int8 不能调用half , lora int8 推理已更新。

MathamPollard commented 1 year ago

作者大大，我使用你更新后的infer_lora_finetuning.py可以运行没有报错，但是当我修改了prompt后就报错，怎么这么奇怪的？我就是把 response, history = model.chat(tokenizer, "写一个诗歌，关于冬天", history=[],max_length=2048, eos_token_id=config.eos_token_id, do_sample=True, top_p=0.7, temperature=0.95,) print('写一个诗歌，关于冬天',' ',response) 改动了一下而已，而且一改动就不行，不管怎么改： response, history = model.chat(tokenizer, "一二三四五", history=[],max_length=2048, eos_token_id=config.eos_token_id, do_sample=True, top_p=0.7, temperature=0.95,) print('写一个诗歌，关于冬天',' ',response)

或者 response, history = model.chat(tokenizer, "abcabc", history=[],max_length=2048, eos_token_id=config.eos_token_id, do_sample=True, top_p=0.7, temperature=0.95,) print('写一个诗歌，关于冬天',' ',response)

报错： RuntimeError: expected scalar type Half but found Float

按理说你的str可以，我的str也可以啊

ssbuild commented 1 year ago

无法重现，删除 best_ckpt output 重新试试。

MathamPollard commented 1 year ago

作者大大，跑通了代码，可以推理了，但是推理好像用的还是你训练好的模型？明明我已经指定了ckpt_dir = './best_ckpt2'，即我训练过的模型，我的输入是一些特定领域的关键词，但是推理结果却是这样的，明显像是用了你举例的数据集那个训练效果{ "id": 1, "text": "，你可以通过轨道交通路线，可以在北京站乘坐轨道交通线，例如线上阅读轨道交通。\n\n1. 拨打轨道交通，可以让你做轨道交通，你可以在陆家嘴站乘坐轨道交通。\n\n2. 加热融化，可以达到入睡，走过国际机场。\n\n3. 焦虑融化，发给做寒冷的冬天，让你到达国际机场。" }, { "id": 2, "text": "寂静，云在天边融化，把快乐和温暖带回家带回家。" }, { "id": 3, "text": "就融化成冰卡，可以刺激你做什么，把光连在浦东上，展开热敷。\nG化成冰封打开，像上海站，可以到达国际机场。\n\nG化融化，清热解毒过去，让你更容易入睡。" },

为什么会出现这种情况呢？

ssbuild commented 1 year ago

删除缓存文件output下面的record ，重新制作数据集训练才可以。

MathamPollard commented 1 year ago

作者大大，终于跑通了代码，可以用我自己的数据集训练和推理了。我想问下，除了'max_epochs'，'max_steps'，'train_batch_size'，'eval_batch_size'， 'test_batch_size'， 'learning_rate'，'max_seq_length'， 'max_target_length'，precision。还有哪些参数会对最后的性能有显著的影响？？

MathamPollard commented 1 year ago

作者大大，如果想多卡训练，是不是指定CUDA_VISIBLE_DEVICES=0，1 python train.py这样就行了？还要做哪些调整？下面的“多卡建议扔掉”是什么意思呢？ train_datasets = dataHelper.load_distributed_random_sampler( dataHelper.train_files, with_load_memory=data_args.data_backend == 'record', collate_fn=dataHelper.collate_fn, batch_size=training_args.train_batch_size, drop_last=True, # 多卡建议扔掉 num_processes=trainer.world_size, process_index=trainer.global_rank, dataset_loader_filter_fn=dataset_loader_filter_fn, num_workers=0 ) 要不要安装nccl?

ssbuild commented 1 year ago

需要安装nccl 单机多卡修改配置 train_info_args devices 即可例如 devices = 2 表示前两个卡 devices = [0,2] 表示第一块和第三块卡。

MathamPollard commented 1 year ago

作者大大，我想先用200万条数据微调，再在此基础上用1万条数据微调，要怎么改？关键是在第二次微调时要怎么同时利用原始预训练模型权重和第一次微调得到的checkpoint？还是说在进行第二次微调时，用第一次微调得到的checkpoint中的pytorch_model_bin文件替换原始预训练模型中的pytorch_model_bin文件就可以了，然后推理时就加载原始预训练模型权重和第二次checkpoint这样？？

ssbuild commented 1 year ago

两种方案： 1.是200万条数据微调后通过 infer_lora_finetuing 转换成 hf 权重pytorch_model_bin，替换过去就可以了。