-1 F1 score and 0 precision

Hi, This sounds to be reported by other developers in https://github.com/jiesutd/NCRFpp/issues/100, https://github.com/jiesutd/NCRFpp/issues/82, https://github.com/jiesutd/NCRFpp/issues/80,https://github.com/jiesutd/NCRFpp/issues/60 and https://github.com/jiesutd/NCRFpp/issues/22. I have checked all of those explanations.

I am trying to build and NER system with this Persian corpus. I used the train_fold1.txt as the training set, test_fold1.txt as dev set and test_fold2.txt as the test set. The corpus is in IOB format by default. I used the tagSchemeConverte.py for conversion to BOI format. The corpus includes various labels, there are some B-X in the labels too.

The problem is the model can not predict correct labels of entities, everything sounds to be predicated as O. Also, there are lots of tokens with O tags, but removing many of samples which all tokens are labelled as O didn't help. Below are my log files:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Start   Sequence   Laebling   task...
     Tag          scheme: NoSeg
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 250
     MAX   WORD   LENGTH: -1
     Number   normalized: True
     Word  alphabet size: 0
     Char  alphabet size: 0
     Label alphabet size: 0
     Word embedding  dir: ../data/arman/processed/wor2vec_skipgram300d.txt
     Char embedding  dir: None
     Word embedding size: 300
     Char embedding size: 25
     Norm   word     emb: False
     Norm   char     emb: False
     Train  file directory: ../data/arman/processed/train_bioes.bmes
     Dev    file directory: ../data/arman/processed/dev_bioes.bmes
     Test   file directory: ../data/arman/processed/test_bioes.bmes
     Raw    file directory: None
     Dset   file directory: None
     Model  file directory: ../models/ncrfpp_model
     Loadmodel   directory: None
     Decode file directory: None
     Train instance number: 0
     Dev   instance number: 0
     Test  instance number: 0
     Raw   instance number: 0
     FEATURE num: 0
 ++++++++++++++++++++++++++++++++++++++++
 Model Network:
     Model        use_crf: True
     Model word extractor: LSTM
     Model       use_char: False
 ++++++++++++++++++++++++++++++++++++++++
 Training:
     Optimizer: SGD
     Iteration: 10
     BatchSize: 50
     Average  batch   loss: False
 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.1
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: 5.0
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 500
     Hyper         dropout: 0.5
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: False
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Train Mode
Load pretrained word embedding, norm: False, dir: ../data/arman/processed/wor2vec_skipgram300d.txt
Embedding:
     pretrain word:1, prefect match:0, case_match:0, oov:18204, oov%:0.9999450700357044
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Start   Sequence   Laebling   task...
     Tag          scheme: BMES
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 250
     MAX   WORD   LENGTH: -1
     Number   normalized: True
     Word  alphabet size: 18205
     Char  alphabet size: 100
     Label alphabet size: 26
     Word embedding  dir: ../data/arman/processed/wor2vec_skipgram300d.txt
     Char embedding  dir: None
     Word embedding size: 1
     Char embedding size: 25
     Norm   word     emb: False
     Norm   char     emb: False
     Train  file directory: ../data/arman/processed/train_bioes.bmes
     Dev    file directory: ../data/arman/processed/dev_bioes.bmes
     Test   file directory: ../data/arman/processed/test_bioes.bmes
     Raw    file directory: None
     Dset   file directory: None
     Model  file directory: ../models/ncrfpp_model
     Loadmodel   directory: None
     Decode file directory: None
     Train instance number: 5121
     Dev   instance number: 2560
     Test  instance number: 2561
     Raw   instance number: 0
     FEATURE num: 0
 ++++++++++++++++++++++++++++++++++++++++
 Model Network:
     Model        use_crf: True
     Model word extractor: LSTM
     Model       use_char: False
 ++++++++++++++++++++++++++++++++++++++++
 Training:
     Optimizer: SGD
     Iteration: 10
     BatchSize: 50
     Average  batch   loss: False
 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.1
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: 5.0
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 500
     Hyper         dropout: 0.5
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: False
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build sequence labeling network...
use_char:  False
word feature extractor:  LSTM
use crf:  True
build word sequence feature extractor: LSTM...
build word representation...
build CRF...

and this is train logs for some epochs ...

Epoch: 0/10
 Learning rate is set as: 0.1
Shuffle: first input word list: [36, 1304, 1765, 37, 155, 1460, 1766, 1767, 1768, 14, 155, 156, 1769, 1770, 1771, 1772, 1773, 408, 1774, 9, 1757, 486, 32, 1344, 1775, 1776, 86, 31, 158, 1777, 146]
     Instance: 500; Time: 30.42s; loss: 11616310.9209; acc: 11306/16748=0.6751
     Instance: 1000; Time: 24.44s; loss: 4451390.2500; acc: 24542/33000=0.7437
     Instance: 1500; Time: 30.09s; loss: 2984007.8750; acc: 38033/49430=0.7694
     Instance: 2000; Time: 20.94s; loss: 2302954.8750; acc: 51089/65281=0.7826
     Instance: 2500; Time: 21.62s; loss: 2374294.3125; acc: 64116/81232=0.7893
     Instance: 3000; Time: 27.74s; loss: 4175166.3125; acc: 77703/98456=0.7892
     Instance: 3500; Time: 44.94s; loss: 5485042.8750; acc: 91344/115130=0.7934
     Instance: 4000; Time: 36.24s; loss: 4783471.6250; acc: 104567/131409=0.7957
     Instance: 4500; Time: 37.05s; loss: 3558930.6250; acc: 117712/147985=0.7954
     Instance: 5000; Time: 31.34s; loss: 3502760.8750; acc: 130876/164058=0.7977
     Instance: 5121; Time: 8.17s; loss: 460803.0000; acc: 134189/167877=0.7993
Epoch: 0 training finished. Time: 312.97s, speed: 16.36st/s,  total loss: 45695133.54589844
totalloss: 45695133.54589844
Right token =  73726  All token =  82119  acc =  0.8977946638414983
Dev: time: 9.23s, speed: 278.56st/s; acc: 0.8978, p: 0.0000, r: 0.0000, f: -1.0000
Exceed previous best f score: -10
Save current best model in file: ../models/ncrfpp_model.0.model
Right token =  74881  All token =  83139  acc =  0.9006723679620876
Test: time: 9.31s, speed: 276.26st/s; acc: 0.9007, p: 0.0769, r: 0.0002, f: 0.0005
Epoch: 1/10
 Learning rate is set as: 0.09523809523809523
Shuffle: first input word list: [1847, 2370, 187, 5415, 1184, 379, 158, 201, 1962, 2114, 517, 2008, 9, 2005, 32, 1587, 4664, 14, 524, 13135, 13136, 3986, 79, 13135, 510, 32, 577, 209, 1710, 8230, 9, 209, 1710, 7638, 599, 203, 53, 14, 2677, 1198, 577, 66, 209, 373, 115, 173, 146]
     Instance: 500; Time: 21.95s; loss: 2685075.0625; acc: 13378/16519=0.8099
     Instance: 1000; Time: 44.42s; loss: 2443135.6250; acc: 26253/32596=0.8054
     Instance: 1500; Time: 27.22s; loss: 2772365.6250; acc: 39293/48648=0.8077
     Instance: 2000; Time: 26.62s; loss: 3139464.5312; acc: 52055/64997=0.8009
     Instance: 2500; Time: 27.67s; loss: 2568971.8750; acc: 65528/80765=0.8113
     Instance: 3000; Time: 24.10s; loss: 3526329.6875; acc: 78257/97029=0.8065
     Instance: 3500; Time: 21.61s; loss: 2431246.9375; acc: 91666/113123=0.8103
     Instance: 4000; Time: 34.22s; loss: 6091913.6250; acc: 105830/130777=0.8092
     Instance: 4500; Time: 26.64s; loss: 2694768.7500; acc: 119720/147512=0.8116
     Instance: 5000; Time: 30.36s; loss: 2411426.4375; acc: 133877/164519=0.8137
     Instance: 5121; Time: 5.00s; loss: 465950.8125; acc: 136618/167877=0.8138
Epoch: 1 training finished. Time: 289.79s, speed: 17.67st/s,  total loss: 31230648.96875
totalloss: 31230648.96875
Right token =  72810  All token =  82119  acc =  0.886640119826106
Dev: time: 8.34s, speed: 308.20st/s; acc: 0.8866, p: 0.0121, r: 0.0025, f: 0.0042
Exceed previous best f score: -1
Save current best model in file: ../models/ncrfpp_model.1.model
Right token =  73915  All token =  83139  acc =  0.8890532722308423
Test: time: 8.38s, speed: 307.12st/s; acc: 0.8891, p: 0.0198, r: 0.0044, f: 0.0072
Epoch: 2/10
 Learning rate is set as: 0.09090909090909091
Shuffle: first input word list: [1054, 5207, 5835, 32, 2010, 1207, 5836, 384, 996, 9, 411, 5837, 5835, 37, 209, 2, 209, 500, 824, 173, 146]
     Instance: 500; Time: 19.25s; loss: 3247655.5000; acc: 13164/16053=0.8200
     Instance: 1000; Time: 26.96s; loss: 3076210.3750; acc: 26239/32531=0.8066
     Instance: 1500; Time: 23.38s; loss: 2412317.0625; acc: 39475/48550=0.8131
     Instance: 2000; Time: 31.63s; loss: 2319485.5000; acc: 53040/65159=0.8140
     Instance: 2500; Time: 24.41s; loss: 2048796.8750; acc: 66446/81388=0.8164
     Instance: 3000; Time: 28.53s; loss: 3531765.7500; acc: 80418/98849=0.8135
     Instance: 3500; Time: 22.72s; loss: 2824496.6875; acc: 93113/114750=0.8114
     Instance: 4000; Time: 22.72s; loss: 2429406.4375; acc: 105955/131127=0.8080
     Instance: 4500; Time: 24.67s; loss: 1839154.5000; acc: 119752/147307=0.8129
     Instance: 5000; Time: 23.01s; loss: 2184528.7188; acc: 133134/163803=0.8128
     Instance: 5121; Time: 10.25s; loss: 441015.4375; acc: 136634/167877=0.8139
Epoch: 2 training finished. Time: 257.54s, speed: 19.88st/s,  total loss: 26354832.84375
totalloss: 26354832.84375
Right token =  72865  All token =  82119  acc =  0.8873098795650215
Dev: time: 8.37s, speed: 307.18st/s; acc: 0.8873, p: 0.0112, r: 0.0012, f: 0.0021
Right token =  74186  All token =  83139  acc =  0.8923128736212849
Test: time: 8.39s, speed: 306.69st/s; acc: 0.8923, p: 0.0063, r: 0.0005, f: 0.0009
Epoch: 3/10
 Learning rate is set as: 0.08695652173913045
Shuffle: first input word list: [3680, 3798, 537, 154, 53, 22, 23, 201, 866, 14, 2677, 466, 537, 3684, 3685, 17, 1618, 844, 17, 2783, 169, 336, 14, 3690, 1734, 944, 3680, 537, 679, 2, 1856, 448, 146]
     Instance: 500; Time: 20.57s; loss: 2418398.3125; acc: 12978/16176=0.8023
     Instance: 1000; Time: 24.27s; loss: 3341107.6562; acc: 26618/32854=0.8102
     Instance: 1500; Time: 25.33s; loss: 2683556.5625; acc: 39485/49192=0.8027
     Instance: 2000; Time: 20.78s; loss: 2654431.3750; acc: 53017/65683=0.8072
     Instance: 2500; Time: 22.51s; loss: 1794621.0000; acc: 66400/81778=0.8120
     Instance: 3000; Time: 31.76s; loss: 2520183.0000; acc: 79705/98349=0.8104
     Instance: 3500; Time: 20.08s; loss: 1936766.8750; acc: 93068/114554=0.8124
     Instance: 4000; Time: 26.30s; loss: 1959160.5000; acc: 106073/131125=0.8089
     Instance: 4500; Time: 20.09s; loss: 2188763.1875; acc: 118936/146789=0.8103
     Instance: 5000; Time: 30.80s; loss: 2170208.8125; acc: 133325/163980=0.8131
     Instance: 5121; Time: 9.67s; loss: 400760.9062; acc: 136733/167877=0.8145
Epoch: 3 training finished. Time: 252.15s, speed: 20.31st/s,  total loss: 24067958.1875
totalloss: 24067958.1875
Right token =  20165  All token =  82119  acc =  0.24555827518601053
Dev: time: 8.49s, speed: 304.43st/s; acc: 0.2456, p: 0.0025, r: 0.0171, f: 0.0043
Exceed previous best f score: 0.0041992746707386905
Save current best model in file: ../models/ncrfpp_model.3.model
Right token =  20345  All token =  83139  acc =  0.24471066527141294
Test: time: 8.48s, speed: 305.08st/s; acc: 0.2447, p: 0.0029, r: 0.0200, f: 0.0050
Epoch: 4/10
 Learning rate is set as: 0.08333333333333334
Shuffle: first input word list: [155, 1056, 1523, 2, 1462, 612, 613, 1232, 2847, 5313, 14, 12269, 2, 1466, 12413, 17, 740, 1112, 12414, 5089, 12411, 17, 11040, 1534, 9, 7443, 14, 155, 5089, 9, 928, 7940, 12401, 14, 594, 155, 12415, 67, 176, 94, 1551, 146]
     Instance: 500; Time: 24.37s; loss: 2371574.3438; acc: 13538/16854=0.8033
     Instance: 1000; Time: 23.93s; loss: 1737718.5938; acc: 26521/32998=0.8037
     Instance: 1500; Time: 25.11s; loss: 2176442.5000; acc: 39817/49167=0.8098
     Instance: 2000; Time: 20.97s; loss: 2618315.4062; acc: 53058/65854=0.8057
     Instance: 2500; Time: 22.27s; loss: 2855402.0625; acc: 66479/82332=0.8075
     Instance: 3000; Time: 29.69s; loss: 1662966.2500; acc: 80006/98599=0.8114
     Instance: 3500; Time: 25.06s; loss: 2173333.7500; acc: 92931/115171=0.8069
     Instance: 4000; Time: 19.64s; loss: 2101886.4375; acc: 106537/131290=0.8115
     Instance: 4500; Time: 18.18s; loss: 1783243.5000; acc: 119535/147250=0.8118
     Instance: 5000; Time: 22.09s; loss: 2594713.3125; acc: 133300/163786=0.8139
     Instance: 5121; Time: 6.83s; loss: 618807.9375; acc: 136296/167877=0.8119
Epoch: 4 training finished. Time: 238.15s, speed: 21.50st/s,  total loss: 22694404.09375
totalloss: 22694404.09375
Right token =  73787  All token =  82119  acc =  0.8985374882792045
Dev: time: 8.39s, speed: 306.32st/s; acc: 0.8985, p: -1.0000, r: 0.0000, f: -1.0000
Right token =  74902  All token =  83139  acc =  0.9009249569997233
Test: time: 8.43s, speed: 305.29st/s; acc: 0.9009, p: -1.0000, r: 0.0000, f: -1.0000
Epoch: 5/10
 Learning rate is set as: 0.08
Shuffle: first input word list: [549, 9761, 37, 614, 459, 14, 360, 209, 9, 209, 96, 1551, 9, 14, 360, 361, 45, 1407, 9761, 37, 614, 1103, 1104, 1105, 524, 17, 235, 173, 146]
     Instance: 500; Time: 21.18s; loss: 2980160.8750; acc: 12452/15747=0.7908
     Instance: 1000; Time: 32.33s; loss: 2264516.3750; acc: 26177/32151=0.8142
     Instance: 1500; Time: 24.32s; loss: 1958151.8750; acc: 39057/48722=0.8016
     Instance: 2000; Time: 19.92s; loss: 2551159.1250; acc: 52718/64735=0.8144
     Instance: 2500; Time: 22.72s; loss: 2041154.7500; acc: 66182/80945=0.8176
     Instance: 3000; Time: 21.34s; loss: 3120712.3750; acc: 78876/96796=0.8149
     Instance: 3500; Time: 20.50s; loss: 2084796.8750; acc: 92454/113422=0.8151
     Instance: 4000; Time: 32.37s; loss: 2529292.3125; acc: 106291/130738=0.8130
     Instance: 4500; Time: 23.54s; loss: 1767164.3125; acc: 119220/146639=0.8130
     Instance: 5000; Time: 22.61s; loss: 2062927.8125; acc: 133104/163549=0.8138
     Instance: 5121; Time: 9.78s; loss: 413305.1250; acc: 136559/167877=0.8134
Epoch: 5 training finished. Time: 250.61s, speed: 20.43st/s,  total loss: 23773341.8125
totalloss: 23773341.8125
Right token =  73787  All token =  82119  acc =  0.8985374882792045
Dev: time: 8.36s, speed: 307.62st/s; acc: 0.8985, p: -1.0000, r: 0.0000, f: -1.0000
Right token =  74902  All token =  83139  acc =  0.9009249569997233
Test: time: 8.36s, speed: 307.53st/s; acc: 0.9009, p: -1.0000, r: 0.0000, f: -1.0000

This is my configuraions:

### I/O ###
train_dir=../data/arman/processed/train_bioes.bmes
dev_dir=../data/arman/processed/dev_bioes.bmes
test_dir=../data/arman/processed/test_bioes.bmes
model_dir=../models/ncrfpp_model
word_emb_dir=../data/arman/processed/wor2vec_skipgram300d.txt

#raw_dir=
#decode_dir=
#dset_dir=
#load_model_dir=
#char_emb_dir=

norm_word_emb=False
norm_char_emb=False
number_normalized=True
seg=True
word_emb_dim=300
char_emb_dim=25

###NetworkConfiguration###
use_crf=True
use_char=False
word_seq_feature=LSTM
char_seq_feature=LSTM
#feature=[POS] emb_size=20
#feature=[Cap] emb_size=20
#nbest=1

###TrainingSetting###
status=train
optimizer=SGD
iteration=10
batch_size=50
ave_batch_loss=False

###Hyperparameters###
cnn_layer=4
char_hidden_dim=25
hidden_dim=500
dropout=0.5
lstm_layer=1
bilstm=True
learning_rate=0.1
lr_decay=0.05
momentum=0
l2=1e-8
#gpu
clip=5.0

For some epochs very small positive F1 has been seen. I thought maybe using different configurations could help the problem but none worked, It would be a great help if you could share your ideas. Thanks

jiesutd / NCRFpp

-1 F1 score and 0 precision #148