Closed JiaDuo-Lin closed 2 years ago
Sorry. I have no idea why this happen. According to my experiences, they should be the same.
我发现了一样的问题,训练时模型效果是正常的,重新加载就全预测错了(还会出现数组越界)。应该就是模型保存和加载的问题。十分希望作者可以在weibo数据集上试一下,第一次do-train, do-eval, do-test,然后加载模型在do-eval, do-test。
+1
我发现了一样的问题,训练时模型效果是正常的,重新加载就全预测错了(还会出现数组越界)。应该就是模型保存和加载的问题。十分希望作者可以在weibo数据集上试一下,第一次do-train, do-eval, do-test,然后加载模型在do-eval, do-test。
I think the problem is in model loading, especially on single GPU.
During training, function train
warp the model with ddp, but when eval or predict they are not. DDP model named their parameters as module.xxxx.xxxx
, so the weights you provided are not loaded at all.This may also cause the model to predict the out-of-bounds label(I have met).
So the solution is, load the weights again when --do_eval
or --do_predict
. If the model is not already a DDP model, you need to warp it.
if args.do_eval:
logger.info("*** Dev Evaluate ***")
dev_dataset = TaskDataset(dev_data_file, params=dataset_params, do_shuffle=False)
if model.__class__.__name__ != 'DistributedDataParallel':
model = model.cuda()
model = torch.nn.parallel.DistributedDataParallel(
model,
device_ids=[args.local_rank],
output_device=args.local_rank,
find_unused_parameters=True
)
if args.model_name_or_path is None: # do eval in training
global_steps = 'final_eval'
else:
try:
global_steps = args.model_name_or_path.split("/")[-2].split("-")[-1]
except:
global_steps = 'user_model'
model.load_state_dict(torch.load(args.model_name_or_path)) # load model state
eval_output, _ = evaluate(model, args, dev_dataset, label_vocab, global_steps, "dev", write_file=True)
eval_output["global_steps"] = global_steps
print("Dev Result: acc: %.4f, p: %.4f, r: %.4f, f1: %.4f\n"%
(eval_output['acc'], eval_output['p'], eval_output['r'], eval_output['f1']))
(or just delete the 'module.' to suit the origin model, loading weights likemodel.load_state_dict({k.replace('module.',''):v for k,v in torch.load('pytorch_model.bin').items()})
, both ways work for me)
I do this change in weibo/NER task, and got same number during train and eval/test(about 0.67).
您好,在训练中测试集还有验证集的效果都很好,但是训练结束后,单独加载训练好的模型的效果就很差,这是因为模型没保存好吗