jiesutd / NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Apache License 2.0
1.89k stars 446 forks source link

-1 F1 score and 0 precision #148

Closed po-oya closed 4 years ago

po-oya commented 4 years ago

Hi, This sounds to be reported by other developers in https://github.com/jiesutd/NCRFpp/issues/100, https://github.com/jiesutd/NCRFpp/issues/82, https://github.com/jiesutd/NCRFpp/issues/80,https://github.com/jiesutd/NCRFpp/issues/60 and https://github.com/jiesutd/NCRFpp/issues/22. I have checked all of those explanations.

I am trying to build and NER system with this Persian corpus. I used the train_fold1.txt as the training set, test_fold1.txt as dev set and test_fold2.txt as the test set. The corpus is in IOB format by default. I used the tagSchemeConverte.py for conversion to BOI format. The corpus includes various labels, there are some B-X in the labels too.

The problem is the model can not predict correct labels of entities, everything sounds to be predicated as O. Also, there are lots of tokens with O tags, but removing many of samples which all tokens are labelled as O didn't help. Below are my log files:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Start   Sequence   Laebling   task...
     Tag          scheme: NoSeg
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 250
     MAX   WORD   LENGTH: -1
     Number   normalized: True
     Word  alphabet size: 0
     Char  alphabet size: 0
     Label alphabet size: 0
     Word embedding  dir: ../data/arman/processed/wor2vec_skipgram300d.txt
     Char embedding  dir: None
     Word embedding size: 300
     Char embedding size: 25
     Norm   word     emb: False
     Norm   char     emb: False
     Train  file directory: ../data/arman/processed/train_bioes.bmes
     Dev    file directory: ../data/arman/processed/dev_bioes.bmes
     Test   file directory: ../data/arman/processed/test_bioes.bmes
     Raw    file directory: None
     Dset   file directory: None
     Model  file directory: ../models/ncrfpp_model
     Loadmodel   directory: None
     Decode file directory: None
     Train instance number: 0
     Dev   instance number: 0
     Test  instance number: 0
     Raw   instance number: 0
     FEATURE num: 0
 ++++++++++++++++++++++++++++++++++++++++
 Model Network:
     Model        use_crf: True
     Model word extractor: LSTM
     Model       use_char: False
 ++++++++++++++++++++++++++++++++++++++++
 Training:
     Optimizer: SGD
     Iteration: 10
     BatchSize: 50
     Average  batch   loss: False
 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.1
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: 5.0
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 500
     Hyper         dropout: 0.5
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: False
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Train Mode
Load pretrained word embedding, norm: False, dir: ../data/arman/processed/wor2vec_skipgram300d.txt
Embedding:
     pretrain word:1, prefect match:0, case_match:0, oov:18204, oov%:0.9999450700357044
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Start   Sequence   Laebling   task...
     Tag          scheme: BMES
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 250
     MAX   WORD   LENGTH: -1
     Number   normalized: True
     Word  alphabet size: 18205
     Char  alphabet size: 100
     Label alphabet size: 26
     Word embedding  dir: ../data/arman/processed/wor2vec_skipgram300d.txt
     Char embedding  dir: None
     Word embedding size: 1
     Char embedding size: 25
     Norm   word     emb: False
     Norm   char     emb: False
     Train  file directory: ../data/arman/processed/train_bioes.bmes
     Dev    file directory: ../data/arman/processed/dev_bioes.bmes
     Test   file directory: ../data/arman/processed/test_bioes.bmes
     Raw    file directory: None
     Dset   file directory: None
     Model  file directory: ../models/ncrfpp_model
     Loadmodel   directory: None
     Decode file directory: None
     Train instance number: 5121
     Dev   instance number: 2560
     Test  instance number: 2561
     Raw   instance number: 0
     FEATURE num: 0
 ++++++++++++++++++++++++++++++++++++++++
 Model Network:
     Model        use_crf: True
     Model word extractor: LSTM
     Model       use_char: False
 ++++++++++++++++++++++++++++++++++++++++
 Training:
     Optimizer: SGD
     Iteration: 10
     BatchSize: 50
     Average  batch   loss: False
 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.1
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: 5.0
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 500
     Hyper         dropout: 0.5
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: False
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build sequence labeling network...
use_char:  False
word feature extractor:  LSTM
use crf:  True
build word sequence feature extractor: LSTM...
build word representation...
build CRF...

and this is train logs for some epochs ...

Epoch: 0/10
 Learning rate is set as: 0.1
Shuffle: first input word list: [36, 1304, 1765, 37, 155, 1460, 1766, 1767, 1768, 14, 155, 156, 1769, 1770, 1771, 1772, 1773, 408, 1774, 9, 1757, 486, 32, 1344, 1775, 1776, 86, 31, 158, 1777, 146]
     Instance: 500; Time: 30.42s; loss: 11616310.9209; acc: 11306/16748=0.6751
     Instance: 1000; Time: 24.44s; loss: 4451390.2500; acc: 24542/33000=0.7437
     Instance: 1500; Time: 30.09s; loss: 2984007.8750; acc: 38033/49430=0.7694
     Instance: 2000; Time: 20.94s; loss: 2302954.8750; acc: 51089/65281=0.7826
     Instance: 2500; Time: 21.62s; loss: 2374294.3125; acc: 64116/81232=0.7893
     Instance: 3000; Time: 27.74s; loss: 4175166.3125; acc: 77703/98456=0.7892
     Instance: 3500; Time: 44.94s; loss: 5485042.8750; acc: 91344/115130=0.7934
     Instance: 4000; Time: 36.24s; loss: 4783471.6250; acc: 104567/131409=0.7957
     Instance: 4500; Time: 37.05s; loss: 3558930.6250; acc: 117712/147985=0.7954
     Instance: 5000; Time: 31.34s; loss: 3502760.8750; acc: 130876/164058=0.7977
     Instance: 5121; Time: 8.17s; loss: 460803.0000; acc: 134189/167877=0.7993
Epoch: 0 training finished. Time: 312.97s, speed: 16.36st/s,  total loss: 45695133.54589844
totalloss: 45695133.54589844
Right token =  73726  All token =  82119  acc =  0.8977946638414983
Dev: time: 9.23s, speed: 278.56st/s; acc: 0.8978, p: 0.0000, r: 0.0000, f: -1.0000
Exceed previous best f score: -10
Save current best model in file: ../models/ncrfpp_model.0.model
Right token =  74881  All token =  83139  acc =  0.9006723679620876
Test: time: 9.31s, speed: 276.26st/s; acc: 0.9007, p: 0.0769, r: 0.0002, f: 0.0005
Epoch: 1/10
 Learning rate is set as: 0.09523809523809523
Shuffle: first input word list: [1847, 2370, 187, 5415, 1184, 379, 158, 201, 1962, 2114, 517, 2008, 9, 2005, 32, 1587, 4664, 14, 524, 13135, 13136, 3986, 79, 13135, 510, 32, 577, 209, 1710, 8230, 9, 209, 1710, 7638, 599, 203, 53, 14, 2677, 1198, 577, 66, 209, 373, 115, 173, 146]
     Instance: 500; Time: 21.95s; loss: 2685075.0625; acc: 13378/16519=0.8099
     Instance: 1000; Time: 44.42s; loss: 2443135.6250; acc: 26253/32596=0.8054
     Instance: 1500; Time: 27.22s; loss: 2772365.6250; acc: 39293/48648=0.8077
     Instance: 2000; Time: 26.62s; loss: 3139464.5312; acc: 52055/64997=0.8009
     Instance: 2500; Time: 27.67s; loss: 2568971.8750; acc: 65528/80765=0.8113
     Instance: 3000; Time: 24.10s; loss: 3526329.6875; acc: 78257/97029=0.8065
     Instance: 3500; Time: 21.61s; loss: 2431246.9375; acc: 91666/113123=0.8103
     Instance: 4000; Time: 34.22s; loss: 6091913.6250; acc: 105830/130777=0.8092
     Instance: 4500; Time: 26.64s; loss: 2694768.7500; acc: 119720/147512=0.8116
     Instance: 5000; Time: 30.36s; loss: 2411426.4375; acc: 133877/164519=0.8137
     Instance: 5121; Time: 5.00s; loss: 465950.8125; acc: 136618/167877=0.8138
Epoch: 1 training finished. Time: 289.79s, speed: 17.67st/s,  total loss: 31230648.96875
totalloss: 31230648.96875
Right token =  72810  All token =  82119  acc =  0.886640119826106
Dev: time: 8.34s, speed: 308.20st/s; acc: 0.8866, p: 0.0121, r: 0.0025, f: 0.0042
Exceed previous best f score: -1
Save current best model in file: ../models/ncrfpp_model.1.model
Right token =  73915  All token =  83139  acc =  0.8890532722308423
Test: time: 8.38s, speed: 307.12st/s; acc: 0.8891, p: 0.0198, r: 0.0044, f: 0.0072
Epoch: 2/10
 Learning rate is set as: 0.09090909090909091
Shuffle: first input word list: [1054, 5207, 5835, 32, 2010, 1207, 5836, 384, 996, 9, 411, 5837, 5835, 37, 209, 2, 209, 500, 824, 173, 146]
     Instance: 500; Time: 19.25s; loss: 3247655.5000; acc: 13164/16053=0.8200
     Instance: 1000; Time: 26.96s; loss: 3076210.3750; acc: 26239/32531=0.8066
     Instance: 1500; Time: 23.38s; loss: 2412317.0625; acc: 39475/48550=0.8131
     Instance: 2000; Time: 31.63s; loss: 2319485.5000; acc: 53040/65159=0.8140
     Instance: 2500; Time: 24.41s; loss: 2048796.8750; acc: 66446/81388=0.8164
     Instance: 3000; Time: 28.53s; loss: 3531765.7500; acc: 80418/98849=0.8135
     Instance: 3500; Time: 22.72s; loss: 2824496.6875; acc: 93113/114750=0.8114
     Instance: 4000; Time: 22.72s; loss: 2429406.4375; acc: 105955/131127=0.8080
     Instance: 4500; Time: 24.67s; loss: 1839154.5000; acc: 119752/147307=0.8129
     Instance: 5000; Time: 23.01s; loss: 2184528.7188; acc: 133134/163803=0.8128
     Instance: 5121; Time: 10.25s; loss: 441015.4375; acc: 136634/167877=0.8139
Epoch: 2 training finished. Time: 257.54s, speed: 19.88st/s,  total loss: 26354832.84375
totalloss: 26354832.84375
Right token =  72865  All token =  82119  acc =  0.8873098795650215
Dev: time: 8.37s, speed: 307.18st/s; acc: 0.8873, p: 0.0112, r: 0.0012, f: 0.0021
Right token =  74186  All token =  83139  acc =  0.8923128736212849
Test: time: 8.39s, speed: 306.69st/s; acc: 0.8923, p: 0.0063, r: 0.0005, f: 0.0009
Epoch: 3/10
 Learning rate is set as: 0.08695652173913045
Shuffle: first input word list: [3680, 3798, 537, 154, 53, 22, 23, 201, 866, 14, 2677, 466, 537, 3684, 3685, 17, 1618, 844, 17, 2783, 169, 336, 14, 3690, 1734, 944, 3680, 537, 679, 2, 1856, 448, 146]
     Instance: 500; Time: 20.57s; loss: 2418398.3125; acc: 12978/16176=0.8023
     Instance: 1000; Time: 24.27s; loss: 3341107.6562; acc: 26618/32854=0.8102
     Instance: 1500; Time: 25.33s; loss: 2683556.5625; acc: 39485/49192=0.8027
     Instance: 2000; Time: 20.78s; loss: 2654431.3750; acc: 53017/65683=0.8072
     Instance: 2500; Time: 22.51s; loss: 1794621.0000; acc: 66400/81778=0.8120
     Instance: 3000; Time: 31.76s; loss: 2520183.0000; acc: 79705/98349=0.8104
     Instance: 3500; Time: 20.08s; loss: 1936766.8750; acc: 93068/114554=0.8124
     Instance: 4000; Time: 26.30s; loss: 1959160.5000; acc: 106073/131125=0.8089
     Instance: 4500; Time: 20.09s; loss: 2188763.1875; acc: 118936/146789=0.8103
     Instance: 5000; Time: 30.80s; loss: 2170208.8125; acc: 133325/163980=0.8131
     Instance: 5121; Time: 9.67s; loss: 400760.9062; acc: 136733/167877=0.8145
Epoch: 3 training finished. Time: 252.15s, speed: 20.31st/s,  total loss: 24067958.1875
totalloss: 24067958.1875
Right token =  20165  All token =  82119  acc =  0.24555827518601053
Dev: time: 8.49s, speed: 304.43st/s; acc: 0.2456, p: 0.0025, r: 0.0171, f: 0.0043
Exceed previous best f score: 0.0041992746707386905
Save current best model in file: ../models/ncrfpp_model.3.model
Right token =  20345  All token =  83139  acc =  0.24471066527141294
Test: time: 8.48s, speed: 305.08st/s; acc: 0.2447, p: 0.0029, r: 0.0200, f: 0.0050
Epoch: 4/10
 Learning rate is set as: 0.08333333333333334
Shuffle: first input word list: [155, 1056, 1523, 2, 1462, 612, 613, 1232, 2847, 5313, 14, 12269, 2, 1466, 12413, 17, 740, 1112, 12414, 5089, 12411, 17, 11040, 1534, 9, 7443, 14, 155, 5089, 9, 928, 7940, 12401, 14, 594, 155, 12415, 67, 176, 94, 1551, 146]
     Instance: 500; Time: 24.37s; loss: 2371574.3438; acc: 13538/16854=0.8033
     Instance: 1000; Time: 23.93s; loss: 1737718.5938; acc: 26521/32998=0.8037
     Instance: 1500; Time: 25.11s; loss: 2176442.5000; acc: 39817/49167=0.8098
     Instance: 2000; Time: 20.97s; loss: 2618315.4062; acc: 53058/65854=0.8057
     Instance: 2500; Time: 22.27s; loss: 2855402.0625; acc: 66479/82332=0.8075
     Instance: 3000; Time: 29.69s; loss: 1662966.2500; acc: 80006/98599=0.8114
     Instance: 3500; Time: 25.06s; loss: 2173333.7500; acc: 92931/115171=0.8069
     Instance: 4000; Time: 19.64s; loss: 2101886.4375; acc: 106537/131290=0.8115
     Instance: 4500; Time: 18.18s; loss: 1783243.5000; acc: 119535/147250=0.8118
     Instance: 5000; Time: 22.09s; loss: 2594713.3125; acc: 133300/163786=0.8139
     Instance: 5121; Time: 6.83s; loss: 618807.9375; acc: 136296/167877=0.8119
Epoch: 4 training finished. Time: 238.15s, speed: 21.50st/s,  total loss: 22694404.09375
totalloss: 22694404.09375
Right token =  73787  All token =  82119  acc =  0.8985374882792045
Dev: time: 8.39s, speed: 306.32st/s; acc: 0.8985, p: -1.0000, r: 0.0000, f: -1.0000
Right token =  74902  All token =  83139  acc =  0.9009249569997233
Test: time: 8.43s, speed: 305.29st/s; acc: 0.9009, p: -1.0000, r: 0.0000, f: -1.0000
Epoch: 5/10
 Learning rate is set as: 0.08
Shuffle: first input word list: [549, 9761, 37, 614, 459, 14, 360, 209, 9, 209, 96, 1551, 9, 14, 360, 361, 45, 1407, 9761, 37, 614, 1103, 1104, 1105, 524, 17, 235, 173, 146]
     Instance: 500; Time: 21.18s; loss: 2980160.8750; acc: 12452/15747=0.7908
     Instance: 1000; Time: 32.33s; loss: 2264516.3750; acc: 26177/32151=0.8142
     Instance: 1500; Time: 24.32s; loss: 1958151.8750; acc: 39057/48722=0.8016
     Instance: 2000; Time: 19.92s; loss: 2551159.1250; acc: 52718/64735=0.8144
     Instance: 2500; Time: 22.72s; loss: 2041154.7500; acc: 66182/80945=0.8176
     Instance: 3000; Time: 21.34s; loss: 3120712.3750; acc: 78876/96796=0.8149
     Instance: 3500; Time: 20.50s; loss: 2084796.8750; acc: 92454/113422=0.8151
     Instance: 4000; Time: 32.37s; loss: 2529292.3125; acc: 106291/130738=0.8130
     Instance: 4500; Time: 23.54s; loss: 1767164.3125; acc: 119220/146639=0.8130
     Instance: 5000; Time: 22.61s; loss: 2062927.8125; acc: 133104/163549=0.8138
     Instance: 5121; Time: 9.78s; loss: 413305.1250; acc: 136559/167877=0.8134
Epoch: 5 training finished. Time: 250.61s, speed: 20.43st/s,  total loss: 23773341.8125
totalloss: 23773341.8125
Right token =  73787  All token =  82119  acc =  0.8985374882792045
Dev: time: 8.36s, speed: 307.62st/s; acc: 0.8985, p: -1.0000, r: 0.0000, f: -1.0000
Right token =  74902  All token =  83139  acc =  0.9009249569997233
Test: time: 8.36s, speed: 307.53st/s; acc: 0.9009, p: -1.0000, r: 0.0000, f: -1.0000

This is my configuraions:

### I/O ###
train_dir=../data/arman/processed/train_bioes.bmes
dev_dir=../data/arman/processed/dev_bioes.bmes
test_dir=../data/arman/processed/test_bioes.bmes
model_dir=../models/ncrfpp_model
word_emb_dir=../data/arman/processed/wor2vec_skipgram300d.txt

#raw_dir=
#decode_dir=
#dset_dir=
#load_model_dir=
#char_emb_dir=

norm_word_emb=False
norm_char_emb=False
number_normalized=True
seg=True
word_emb_dim=300
char_emb_dim=25

###NetworkConfiguration###
use_crf=True
use_char=False
word_seq_feature=LSTM
char_seq_feature=LSTM
#feature=[POS] emb_size=20
#feature=[Cap] emb_size=20
#nbest=1

###TrainingSetting###
status=train
optimizer=SGD
iteration=10
batch_size=50
ave_batch_loss=False

###Hyperparameters###
cnn_layer=4
char_hidden_dim=25
hidden_dim=500
dropout=0.5
lstm_layer=1
bilstm=True
learning_rate=0.1
lr_decay=0.05
momentum=0
l2=1e-8
#gpu
clip=5.0

For some epochs very small positive F1 has been seen. I thought maybe using different configurations could help the problem but none worked, It would be a great help if you could share your ideas. Thanks

jiesutd commented 4 years ago
  1. I guess you have an incorrect pretrained embedding format. pretrain word:1, prefect match:0, case_match:0, oov:18204, oov%:0.9999450700357044

  2. Your training loss is too big, this will make your training unstable and can't converge to good parameters. If it is not caused by the first problem, you need to finetune your hyperparameters or try different embeddings.