Open cppww opened 1 year ago
采用这样的方式显存不够: model_chatglm = ChatGLMForConditionalGeneration.from_pretrained(pretrained_model_name_or_path) model_chatglm = model_chatglm.half() 采用这样的方式会报上面的错: model_chatglm = ChatGLMForConditionalGeneration.from_pretrained(pretrained_model_name_or_path, load_in_8bit=True, device_map="auto" )
INT8训练不太稳定,建议还是FP16。 LN很敏感,需要FP16, FP32才比较稳定。 如题,INT8仿 t10_lora_trl_train_ppo.py 加上
model = prepare_model_for_int8_training(model,
use_gradient_checkpointing=True,
output_embedding_layer_name="lm_head",
#layer_norm_names=[],
layer_norm_names=["post_attention_layernorm",
"input_layernorm",
"ln_f"
],
)
INT8训练不太稳定,建议还是FP16。 LN很敏感,需要FP16, FP32才比较稳定。 如题,INT8仿 t10_lora_trl_train_ppo.py 加上
model = prepare_model_for_int8_training(model, use_gradient_checkpointing=True, output_embedding_layer_name="lm_head", #layer_norm_names=[], layer_norm_names=["post_attention_layernorm", "input_layernorm", "ln_f" ], )
就是用的更新后的代码,但是不采用load_in_8bit,而是使用.half()的话,3090 24GB单卡显存会不够。┭┮﹏┭┮,请问你这个最低需要的显存是多少呀? 还有个问题想请教一下: model_ref = create_reference_model(model) 得到的model_ref模型是什么结构的呢,可以直接用model_ref.generate()方法吗
额,这儿half需要30G左右吧。model_ref是基准模型不更新梯度,不要让新学习的模型结果太偏离原始回答。
好的好的,谢谢大佬! 再请教您两个问题可以吗:
▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ Traceback (most recent call last) ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒ /ChatGLM-6B/ChatGLM_math/chatglm_maths/t10_toy_trl_train_ppo.py:215 in ▒ ▒ ▒
▒ ▒
▒ 212 ▒ # get model response ▒
▒ 213 ▒ # print(query_tensor) ▒
▒ 214 ▒ ▒
▒ ▒ 215 ▒ response_tensor = respond_to_batch_new(model_ref, query_tensor, txt_len=MAX_LEN, top ▒
▒ 216 ▒ # define a reward for response ▒
▒ 217 ▒ # (this could be any reward such as human feedback or output from another model) ▒
▒ 218 ▒ response_ids = response_tensor.detach().cpu().numpy().tolist() ▒
▒ ▒
▒ /ChatGLM-6B/ChatGLM_math/chatglm_maths/t10_toy_trl_train_ppo.py:62 in ▒
▒ respond_to_batch_new ▒
▒ ▒
▒ 59 ▒ ▒ next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p= ▒
▒ 60 ▒ ▒ # Sample ▒
▒ 61 ▒ ▒ probs = F.softmax(next_token_logits, dim=-1) ▒
▒ ▒ 62 ▒ ▒ next_token = torch.multinomial(probs, num_samples=1).squeeze(1) ▒
▒ 63 ▒ ▒ start_ids = torch.cat([start_ids, next_token.unsqueeze(-1)], dim=-1) ▒
▒ 64 ▒ ▒ # EOS ▒
▒ 65 ▒ ▒ if next_token.detach().cpu().numpy()[0] == tokenizer.eos_token_id: ▒
▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
RuntimeError: probability tensor contains either
inf
,nan
or element < 0