RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

cppww commented 1 year ago

▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ Traceback (most recent call last) ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒ /ChatGLM-6B/ChatGLM_math/chatglm_maths/t10_toy_trl_train_ppo.py:215 in ▒ ▒ ▒ ▒ ▒ ▒ 212 ▒ # get model response ▒ ▒ 213 ▒ # print(query_tensor) ▒ ▒ 214 ▒ ▒ ▒ ▒ 215 ▒ response_tensor = respond_to_batch_new(model_ref, query_tensor, txt_len=MAX_LEN, top ▒ ▒ 216 ▒ # define a reward for response ▒ ▒ 217 ▒ # (this could be any reward such as human feedback or output from another model) ▒ ▒ 218 ▒ response_ids = response_tensor.detach().cpu().numpy().tolist() ▒ ▒ ▒ ▒ /ChatGLM-6B/ChatGLM_math/chatglm_maths/t10_toy_trl_train_ppo.py:62 in ▒ ▒ respond_to_batch_new ▒ ▒ ▒ ▒ 59 ▒ ▒ next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p= ▒ ▒ 60 ▒ ▒ # Sample ▒ ▒ 61 ▒ ▒ probs = F.softmax(next_token_logits, dim=-1) ▒ ▒ ▒ 62 ▒ ▒ next_token = torch.multinomial(probs, num_samples=1).squeeze(1) ▒ ▒ 63 ▒ ▒ start_ids = torch.cat([start_ids, next_token.unsqueeze(-1)], dim=-1) ▒ ▒ 64 ▒ ▒ # EOS ▒ ▒ 65 ▒ ▒ if next_token.detach().cpu().numpy()[0] == tokenizer.eos_token_id: ▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ RuntimeError: probability tensor contains either inf, nan or element < 0

cppww commented 1 year ago

采用这样的方式显存不够： model_chatglm = ChatGLMForConditionalGeneration.from_pretrained(pretrained_model_name_or_path) model_chatglm = model_chatglm.half() 采用这样的方式会报上面的错： model_chatglm = ChatGLMForConditionalGeneration.from_pretrained(pretrained_model_name_or_path, load_in_8bit=True, device_map="auto" )

yongzhuo commented 1 year ago

INT8训练不太稳定，建议还是FP16。 LN很敏感，需要FP16, FP32才比较稳定。如题，INT8仿 t10_lora_trl_train_ppo.py 加上

model = prepare_model_for_int8_training(model,
        use_gradient_checkpointing=True,
        output_embedding_layer_name="lm_head",
        #layer_norm_names=[],
        layer_norm_names=["post_attention_layernorm",
                          "input_layernorm",
                          "ln_f"
                          ],
        )

cppww commented 1 year ago

INT8训练不太稳定，建议还是FP16。 LN很敏感，需要FP16, FP32才比较稳定。如题，INT8仿 t10_lora_trl_train_ppo.py 加上

model = prepare_model_for_int8_training(model,
        use_gradient_checkpointing=True,
        output_embedding_layer_name="lm_head",
        #layer_norm_names=[],
        layer_norm_names=["post_attention_layernorm",
                          "input_layernorm",
                          "ln_f"
                          ],
        )

就是用的更新后的代码，但是不采用load_in_8bit，而是使用.half()的话，3090 24GB单卡显存会不够。┭┮﹏┭┮，请问你这个最低需要的显存是多少呀？还有个问题想请教一下： model_ref = create_reference_model(model) 得到的model_ref模型是什么结构的呢，可以直接用model_ref.generate()方法吗

yongzhuo commented 1 year ago

额，这儿half需要30G左右吧。model_ref是基准模型不更新梯度，不要让新学习的模型结果太偏离原始回答。

cppww commented 1 year ago

好的好的，谢谢大佬! 再请教您两个问题可以吗：

使用t10_lora_trl_train_ppo.py跑出来之后，保存的bin文件应该有多大呀？我跑下来保存的只有17.5kb。
使用t10_toy_trl_train_ppo.py采用了load_in_8bit之后保存下来的权重只有6875.5MB，想要保存和ChatGLM原本参数量相同的bin有操作的方法吗？还是说想要和原模型参数量相同只能通过lora，然后合并adapter的方式。

yongzhuo / chatglm-maths

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 #6