Open YeDeming opened 2 years ago
同学你好,
感谢你开源的代码和完整的注释。
我在conll03上复现了论文的结果,但在ontonote上遇到了一些困难。
我使用
https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO
上提供的数据,并修改https://github.com/yhcc/BARTNER/blob/d54d33127f0b8032c5ea78afcf20ef44fbccf058/train.py#L124-L126
为
paths = { 'train': "/home/yedeming/data/ontonotes/onto.train.ner", 'dev': "/home/yedeming/data/ontonotes/onto.development.ner", 'test': "/home/yedeming/data/ontonotes/onto.test.ner", }
运行结果如下:
Save cache to caches/data_facebook/bart-large_en-ontonotes_word.pt. max_len_a:0.8, max_len:10 In total 3 datasets: train has 115812 instances. dev has 15680 instances. test has 12217 instances. The number of tokens in tokenizer 50265 50283 50288 ...... Best test performance(may not correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.64999999999999, 'rec': 88.52, 'pre': 86.79, 'em': 0.8727}} achieved at Epoch:16. Best test performance(correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.36, 'rec': 88.28, 'pre': 86.47, 'em': 0.8717}} achieved at Epoch:26. In Epoch:26/Step:68409, got best dev performance: Seq2SeqSpanMetric: f=88.94, rec=89.79, pre=88.11, em=0.859
得到测试集F1=87.36,与正常数值相差较大,不知道出现了什么问题
期待您的回复! 叶德铭
请问您的问题解决了吗~
暂时没解决
不好意思,一直没留意到。这个应该是数据集不一样导致的,以下是我的数据规格。你应该是使用ontonotes的v12,但过去的论文一般是使用的v4(参考https://github.com/yhcc/OntoNotes-5.0-NER)
In total 3 datasets: dev has 8528 instances. test has 8262 instances. train has 59924 instances.
同学你好,
感谢你开源的代码和完整的注释。
我在conll03上复现了论文的结果,但在ontonote上遇到了一些困难。
我使用
https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO
上提供的数据,并修改https://github.com/yhcc/BARTNER/blob/d54d33127f0b8032c5ea78afcf20ef44fbccf058/train.py#L124-L126
为
paths = { 'train': "/home/yedeming/data/ontonotes/onto.train.ner", 'dev': "/home/yedeming/data/ontonotes/onto.development.ner", 'test': "/home/yedeming/data/ontonotes/onto.test.ner", }
运行结果如下:
Save cache to caches/data_facebook/bart-large_en-ontonotes_word.pt. max_len_a:0.8, max_len:10 In total 3 datasets: train has 115812 instances. dev has 15680 instances. test has 12217 instances. The number of tokens in tokenizer 50265 50283 50288 ...... Best test performance(may not correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.64999999999999, 'rec': 88.52, 'pre': 86.79, 'em': 0.8727}} achieved at Epoch:16. Best test performance(correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.36, 'rec': 88.28, 'pre': 86.47, 'em': 0.8717}} achieved at Epoch:26. In Epoch:26/Step:68409, got best dev performance: Seq2SeqSpanMetric: f=88.94, rec=89.79, pre=88.11, em=0.859
得到测试集F1=87.36,与正常数值相差较大,不知道出现了什么问题
期待您的回复! 叶德铭
您好conll2003的数据集您哪里找的
同学你好,
感谢你开源的代码和完整的注释。
我在conll03上复现了论文的结果,但在ontonote上遇到了一些困难。
我使用
https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO
上提供的数据,并修改 https://github.com/yhcc/BARTNER/blob/d54d33127f0b8032c5ea78afcf20ef44fbccf058/train.py#L124-L126为
运行结果如下:
得到测试集F1=87.36,与正常数值相差较大,不知道出现了什么问题
期待您的回复! 叶德铭