liu-nlper / NER-LSTM-CRF

An easy-to-use named entity recognition (NER) toolkit, implemented the Bi-LSTM+CRF model in tensorflow.
347 stars 117 forks source link

CRF loss為負數 #1

Closed odek53r closed 7 years ago

odek53r commented 7 years ago

我用JNLPBA(http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html) Training Data(iob格式)資料訓練模型,可是train, dev loss卻有負數的情況出現。 下列為config值 `model: NER model_params: bilstm_params: num_units: 128

num layers???u???щиm??1

    num_layers: 1
feature_names: ['f1']
embed_params:
    f1:
        dropout_rate: 0.5
        shape: [23000, 100]
        path_pre_train: #'./data/embedding.txt'
        path: #'./Res/embed/char_embed.pkl'
    f2:
        dropout_rate: 0.3
        shape: [5, 32]
        path_pre_train: null
        path: null
use_crf: True
rnn_unit: 'gru'  # 'lstm' or 'gru'
learning_rate: 0.01
clip: 10
dev_size: 0.1
dropout_rate: 0.5
l2_rate: 0.00
nb_classes: 11
sequence_length: 208
batch_size: 512
nb_epoch: 1000
max_patience: 20
path_model: './Model/best_model'

data_params: voc_params: f1: min_count: 0 path: './Res/voc/f1.voc.pkl' f2: min_count: 0 path: './Res/voc/f2.voc.pkl' label: min_count: 0 path: './Res/voc/label.voc.pkl' sep: 'table' # table or space path_train: './data/JNLPBA/Genia4ERtask1.iob2' path_test: './data/JNLPBA/sampletest1.iob2' path_result: './data/JNLPBA/sampletest1.iob2_result.txt'` 想問一下為何crf loss會出現負數,該如何解決呢? 感謝

liu-nlper commented 7 years ago

模型写的比较粗糙,谢谢指出问题。我用你说的数据集跑了一下,是可以正常训练的,你可以把你的数据发给我看一下吗?

odek53r commented 7 years ago

我用的資料 http://www.nactem.ac.uk/tsujii/GENIA/ERtask/Genia4ERtraining.tar.gz training, testing data為下列config中的設定。

evironment: window10, python3.5.2, tensorflow-gpu 1.2.0

處理步驟 step1: python preprocessing.py step2: python train.py 大約epoch6 開始出現負數

config改了num_units, feature names, shape, path_pre_train, path, learning rate, l2_rate, nb_class, batch size ,nb_epoch

config:
model: NER model_params: bilstm_params: num_units: 128

num layers???u???щиm??1

    num_layers: 1
feature_names: ['f1']
embed_params:
    f1:
        dropout_rate: 0.5
        shape: [23000, 100]
        path_pre_train: #'./data/embedding.txt'
        path: #'./Res/embed/char_embed.pkl'
    f2:
        dropout_rate: 0.3
        shape: [5, 32]
        path_pre_train: null
        path: null
use_crf: True
rnn_unit: 'gru'  # 'lstm' or 'gru'
learning_rate: 0.01
clip: 10
dev_size: 0.1
dropout_rate: 0.5
l2_rate: 0.00
nb_classes: 11
sequence_length: 250
batch_size: 512
nb_epoch: 1000
max_patience: 20
path_model: './Model/best_model'

data_params: voc_params: f1: min_count: 0 path: './Res/voc/f1.voc.pkl' f2: min_count: 0 path: './Res/voc/f2.voc.pkl' label: min_count: 0 path: './Res/voc/label.voc.pkl' sep: 'table' # table or space path_train: './data/JNLPBA/Genia4ERtask1.iob2' path_test: './data/JNLPBA/sampletest1.iob2' path_result: './data/JNLPBA/sampletest1.iob2_result.txt'

liu-nlper commented 7 years ago

你好,我看了一下你提供数据集,nb_classes应该设置为12(包括padding值的标签),应该是这个原因。代码功能还比较粗糙,希望多多提意见。:-)

odek53r commented 7 years ago

將nb_classes設置為12後可以順利訓練了,感謝!!