liuwei1206 / LEBERT

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
338 stars 60 forks source link

加载训练好的模型出现问题 #51

Closed JiaDuo-Lin closed 2 years ago

JiaDuo-Lin commented 2 years ago

您好,在训练中测试集还有验证集的效果都很好,但是训练结束后,单独加载训练好的模型的效果就很差,这是因为模型没保存好吗

liuwei1206 commented 2 years ago

Sorry. I have no idea why this happen. According to my experiences, they should be the same.

kk19990709 commented 2 years ago

我发现了一样的问题,训练时模型效果是正常的,重新加载就全预测错了(还会出现数组越界)。应该就是模型保存和加载的问题。十分希望作者可以在weibo数据集上试一下,第一次do-train, do-eval, do-test,然后加载模型在do-eval, do-test。

newcolour1994 commented 2 years ago

+1

mxa4646 commented 2 years ago

我发现了一样的问题,训练时模型效果是正常的,重新加载就全预测错了(还会出现数组越界)。应该就是模型保存和加载的问题。十分希望作者可以在weibo数据集上试一下,第一次do-train, do-eval, do-test,然后加载模型在do-eval, do-test。

I think the problem is in model loading, especially on single GPU.

During training, function train warp the model with ddp, but when eval or predict they are not. DDP model named their parameters as module.xxxx.xxxx, so the weights you provided are not loaded at all.This may also cause the model to predict the out-of-bounds label(I have met).

So the solution is, load the weights again when --do_eval or --do_predict. If the model is not already a DDP model, you need to warp it.

    if args.do_eval:
        logger.info("*** Dev Evaluate ***")
        dev_dataset = TaskDataset(dev_data_file, params=dataset_params, do_shuffle=False)
        if model.__class__.__name__ != 'DistributedDataParallel':
            model = model.cuda()
            model = torch.nn.parallel.DistributedDataParallel(
                model,
                device_ids=[args.local_rank],
                output_device=args.local_rank,
                find_unused_parameters=True
            )
        if args.model_name_or_path is None: # do eval in training
            global_steps = 'final_eval'
        else:
            try:
                global_steps = args.model_name_or_path.split("/")[-2].split("-")[-1]
            except:
                global_steps = 'user_model'
            model.load_state_dict(torch.load(args.model_name_or_path)) # load model state
        eval_output, _ = evaluate(model, args, dev_dataset, label_vocab, global_steps, "dev", write_file=True)
        eval_output["global_steps"] = global_steps
        print("Dev Result: acc: %.4f, p: %.4f, r: %.4f, f1: %.4f\n"%
              (eval_output['acc'], eval_output['p'], eval_output['r'], eval_output['f1']))

(or just delete the 'module.' to suit the origin model, loading weights likemodel.load_state_dict({k.replace('module.',''):v for k,v in torch.load('pytorch_model.bin').items()}), both ways work for me)

I do this change in weibo/NER task, and got same number during train and eval/test(about 0.67).