macanv / BERT-BiLSTM-CRF-NER

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
https://github.com/macanv/BERT-BiLSMT-CRF-NER
4.69k stars 1.25k forks source link

无法生成label_list.pkl #290

Closed Alethx closed 4 years ago

Alethx commented 4 years ago

训练过程没有报错。尝试用terminal_predict.py做在线预测,结果提示output路径里没有label_list.pkl文件。 猜测原因可能是把Bert_lstm_ner.py里get_labels语句改了。 原本是 def get_labels(self, labels=None): if labels is not None: try:

支持从文件中读取标签类型

            if os.path.exists(labels) and os.path.isfile(labels):
                with codecs.open(labels, 'r', encoding='utf-8') as fd:
                    for line in fd:
                        self.labels.append(line.strip())
            else:
                # 否则通过传入的参数,按照逗号分割
                self.labels = labels.split(',')
            self.labels = set(self.labels) # to set
        except Exception as e:
            print(e)
    # 通过读取train文件获取标签的方法会出现一定的风险。
    if os.path.exists(os.path.join(self.output_dir, 'label_list.pkl')):
        with codecs.open(os.path.join(self.output_dir, 'label_list.pkl'), 'rb') as rf:
            self.labels = pickle.load(rf)
    else:
        if len(self.labels) > 0:
            self.labels = self.labels.union(set(["X", "[CLS]", "[SEP]"]))
            with codecs.open(os.path.join(self.output_dir, 'label_list.pkl'), 'wb') as rf:
                pickle.dump(self.labels, rf)
        else:
            self.labels = ["O", 'B-TIM', 'I-TIM', "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "X", "[CLS]", "[SEP]"]
    return self.labels

后面改成了 def get_labels(self): return ["O", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "X", "[CLS]", "[SEP]"] 请问是这个原因吗?

另外,对于第一种get_labels语句,是完全从train data读取标签对吗,所以无论train data怎样都不需要改动对吗?

macanv commented 4 years ago

是的,labe 是从训练数据中读取的set,然后生成的map。