[训练问题/错误]求助，如何大模型使用f16时进行LoRA训练

机器：云服务器，24G内存，单卡24G显存环境：PyTorch 1.12.1，Cuda 11.3.1， Python 3.8，Ubuntu 20.04 （torch使用1.12而非推荐的1.13是因为云服务器提供的镜像只有这个，不知道GPU实际是什么卡不敢贸然换）问题描述：把模型和配置文件都放在本地，改动读取地址后，按照项目提供的finetune.ipynb执行没有错误，能正常开始训练，loss的下降也和示例类似。尝试在大模型16位的状态下finetune则出现读取模型后只可以预测或只可以训练LoRA。

(1) load_in_8bit=False 时只能训练/推理

(2) load_in_8bit=False 同时改动transformers的trainer部分代码，使其能推理和开始训练，开始训练后就无法推理。

求助，这里出错的原因是什么，让大模型在16位的状态下训练LoRA应该怎么做？为什么 “load_in_8bit=True” 的时候既可以预测又可以训练但到FP16的情况下则不行？（PS：16的训练完的LoRA还没尝试保存下来之后是否能够正常使用）

本地读取相关的改动如下：

#Auto的全部去掉，本地读取
from tokenization_chatglm import ChatGLMTokenizer
from modeling_chatglm import ChatGLMForConditionalGeneration

model = ChatGLMForConditionalGeneration.from_pretrained("/local_user/code/chatGLM6B", device_map='auto')
#model = ChatGLMForConditionalGeneration.from_pretrained("THUDM/chatglm-6b", load_in_8bit=True, trust_remote_code=True, device_map='auto')

tokenizer = ChatGLMTokenizer.from_pretrained("/local_user/code/chatGLM6B")
#tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

1. 将model 建立时的 load_in_8bit=False，无法执行 model.generate ，model.generate()时报错“RuntimeError: expected scalar type Half but found Float”；但能运行训练，速度比8bit的快一些，loss正常下降。

2. 使用““torch.set_default_tensor_type(torch.cuda.HalfTensor)”后能正常运行 model.generate ；但无法开始训练，trainer.train()报错“Expected a ‘cuda‘ device type for generator but found ‘cpu‘”；

/usr/local/lib/python3.8/dist-packages/transformers/trainer.py --> line: 773
    def _get_train_sampler(self) -> Optional[torch.utils.data.Sampler]:
        if self.train_dataset is None or not has_length(self.train_dataset):
            return None

        generator = None
        if self.args.world_size <= 1:
+++         generator = torch.Generator(device=self.args.device)
---         generator = torch.Generator()

强行在 transformers 的源代码 trainer 那里把 generator 设成和传入的训练参数一个设备（我这里都是cuda0）后尝试训练， 仍无法开始训练，trainer.train()报错“ValueError: Attempting to unscale FP16 gradients.”

3.在2的基础上将下面 fp16=True 注释掉

training_args = TrainingArguments(
    "output",
    #fp16 =True,       ###############################
    gradient_accumulation_steps=1,
    per_device_train_batch_size = 1,
    learning_rate = 1e-4,
    max_steps=1500,
    logging_steps=50,
    remove_unused_columns=False,
    seed=0,
    data_seed=0,
    group_by_length=False,
)

前50个step的loss看着也正常，2.+左右，但从第二个报告即100开始loss直接变为0；

即使前50个step，loss正常，一旦开始训练模型就”损坏了“，在执行 generate 时失败，详细报错如下：

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [13], line 17
     11 input_ids = tokenizer.encode(input_text, return_tensors='pt')
     12 out = model.generate(
     13     input_ids=input_ids,
     14     max_length=150,
     15     temperature=0
     16 )
---> 17 answer = tokenizer.decode(out[0])
     18 print(answer)
     19 item['infer_answer'] = answer

File /local_user/code/ChatGLM-Tuning-master/examples/../tokenization_chatglm.py:276, in ChatGLMTokenizer.decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens, **kwargs)
    274 if self.pad_token_id in token_ids:  # remove pad
    275     token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
--> 276 return self.sp_tokenizer.decode(token_ids)

File /local_user/code/ChatGLM-Tuning-master/examples/../tokenization_chatglm.py:133, in SPTokenizer.decode(self, text_ids, special_tokens)
    131 def decode(self, text_ids: List[int], special_tokens=False) -> str:
    132     ids = [int(_id) - self.num_image_tokens for _id in text_ids]
--> 133     text = self._get_text_tokenizer(encode_special_tokens=special_tokens).decode(ids)
    134     text = text.replace("<n>", "\n")
    135     text = text.replace(SPTokenizer.get_tab_token(), "\t")

File /usr/local/lib/python3.8/dist-packages/icetk/text_tokenizer.py:46, in TextTokenizer.decode(self, ids)
     45 def decode(self, ids: List[int]):
---> 46     return self.sp.DecodeIds(ids)

File /usr/local/lib/python3.8/dist-packages/sentencepiece/__init__.py:837, in SentencePieceProcessor.DecodeIds(self, input, out_type, **kwargs)
    836 def DecodeIds(self, input, out_type=str, **kwargs):
--> 837   return self.Decode(input=input, out_type=out_type, **kwargs)

File /usr/local/lib/python3.8/dist-packages/sentencepiece/__init__.py:780, in SentencePieceProcessor.Decode(self, input, out_type, num_threads)
    778 if type(input) is list:
    779   if len(input) == 0 or type(input[0]) is int:
--> 780     return self._DecodeIds(input)
    781   if type(input[0]) is str:
    782     return self._DecodePieces(input)

File /usr/local/lib/python3.8/dist-packages/sentencepiece/__init__.py:337, in SentencePieceProcessor._DecodeIds(self, ids)
    336 def _DecodeIds(self, ids):
--> 337     return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids)

IndexError: Out of range: piece id is out of range.

查看输出时发现和之前的对比，非0部分的完全一样：取的第一个问答的输出

#"print(out)"
#before:
tensor([[ 57010,  20012,  23461,  20295,  24703,  20108,  28555,  21849,  20007,
          20004,  33049,  20012, 150001, 150004,  20115,  20022,  20143,  25900,
          20006,  20137,  20115,  20022,  20143,  20128,  20547,  20196,  20120,
          20022,  20203,  22894,  20007,  30028,  20120,  21226,  20411,  20148,
          21303,  20126,  29610,  20142,  20574,  20031, 150005]])

#after:
tensor([[ 57010,  20012,  23461,  20295,  24703,  20108,  28555,  21849,  20007,
          20004,  33049,  20012, 150001, 150004,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0]])

mymusise / ChatGLM-Tuning