microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.09k stars 2.55k forks source link

How to evaluate?怎么样得到论文里的结果 #31

Closed PROoshio closed 4 years ago

PROoshio commented 4 years ago

作者你好,请问论文给出的结果是最后一个epoch的评估结果,还是每个epoch dev集最高的测试结果呢?另外如果是第一种random seed有放开么?感谢~

donglixp commented 4 years ago

There are two flags implemented for the purpose:

parser.add_argument("--save_best", action='store_true',
                    help="Save best epoch.")
parser.add_argument("--only_eval_best", action='store_true',
                    help="Only evaluate best epoch.")

We used --save_best on the dev set, and then added --only_eval_best for the final evaluation.

ArijRB commented 4 years ago

@donglixp Hi i am looking how to evaluate after using the decode code of s2sft or question generation on SQUAD ( I finetuned the model before)

donglixp commented 4 years ago

@donglixp Hi i am looking how to evaluate after using the decode code of s2sft or question generation on SQUAD ( I finetuned the model before)

Please refer to https://github.com/microsoft/unilm/tree/master/unilm-v1#question-generation---squad for the evaluation of QG.