stanleylsx / entity_extractor_by_ner

基于Tensorflow2.3开发的NER模型,都是CRF范式,包含Bilstm(IDCNN)-CRF、Bert-Bilstm(IDCNN)-CRF、Bert-CRF,可微调预训练模型,可对抗学习,用于命名实体识别,配置后可直接运行。
398 stars 74 forks source link

更改为自己数据集后的训练问题 #76

Closed LELE-ZZ closed 1 month ago

LELE-ZZ commented 2 months ago

请问一下作者 1.为啥bert-bilstm/idcnn-crf训练前五次评价结果都为0自动停止 2.bilstm/idcnn-crf模型训练时报错:Traceback (most recent call last): File "E:/project-NER/main.py", line 67, in dataManager = DataManager(configs, logger) File "E:\project-NER\engines\data.py", line 40, in init self.token2id, self.id2token, self.label2id, self.id2label = self.load_vocab() File "E:\project-NER\engines\data.py", line 61, in load_vocab return self.build_vocab(self.train_file) File "E:\project-NER\engines\data.py", line 93, in build_vocab token2id = dict(zip(tokens, range(1, len(tokens) + 1))) TypeError: unhashable type: 'list'

stanleylsx commented 1 month ago

请问一下作者 1.为啥bert-bilstm/idcnn-crf训练前五次评价结果都为0自动停止 2.bilstm/idcnn-crf模型训练时报错:Traceback (most recent call last): File "E:/project-NER/main.py", line 67, in dataManager = DataManager(configs, logger) File "E:\project-NER\engines\data.py", line 40, in init self.token2id, self.id2token, self.label2id, self.id2label = self.load_vocab() File "E:\project-NER\engines\data.py", line 61, in load_vocab return self.build_vocab(self.train_file) File "E:\project-NER\engines\data.py", line 93, in build_vocab token2id = dict(zip(tokens, range(1, len(tokens) + 1))) TypeError: unhashable type: 'list'

  1. 疑似欠拟合,需要调大lr或者增加数据,特别是不用到Bert,很容易欠拟合不收敛;
  2. 具体数据集是怎么样的?是不是按照例子里面的结构构造的。