MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k
stars
492
forks
source link
关于ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.") ValueError: weight is on the meta device, we need a `value` to put in on cpu.错误问题 #272
请教一下,
尝试用llama-13b reward_modeling发现会出现错误讯息:
raise ValueError(f"{tensor_name} is on the meta device, we need a value to put in on {device}.")
ValueError: weight is on the meta device, we need a value to put in on cpu.
尝试过4bit量化可以解决,但在rl_training在载入marge的reward model会有问题:
raise RuntimeError('Error(s) in loading state_dict:While copying the parameter named "model.layers.19.self_attn.o_proj.weight", whose dimensions in the model are torch.Size([5120, 5120]) and whose dimensions in the checkpoint are torch.Size([5120, 5120]), an exception occurred : ('Cannot copy out of meta tensor; no data!',).....
请教一下, 尝试用llama-13b reward_modeling发现会出现错误讯息: raise ValueError(f"{tensor_name} is on the meta device, we need a
value
to put in on {device}.") ValueError: weight is on the meta device, we need avalue
to put in on cpu.尝试过4bit量化可以解决,但在rl_training在载入marge的reward model会有问题: raise RuntimeError('Error(s) in loading state_dict:While copying the parameter named "model.layers.19.self_attn.o_proj.weight", whose dimensions in the model are torch.Size([5120, 5120]) and whose dimensions in the checkpoint are torch.Size([5120, 5120]), an exception occurred : ('Cannot copy out of meta tensor; no data!',).....