jiesutd / NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Apache License 2.0
1.89k stars 446 forks source link

f score < 0.01 for conll2003 data #153

Closed nnakamura3 closed 4 years ago

nnakamura3 commented 4 years ago

Hi, even after 10 epochs, f score < 0.01 and accuracy stay around 0.83. I checked my data format, and I don't think it is wrong like past issues.

could you give me a hand with this problem?

my config:

### use # to comment out the configure item
### I/O ###
train_dir=dataset/bpe20000/train.BIOES
dev_dir=dataset/bpe20000/valid.BIOES
test_dir=dataset/bpe20000/test.BIOES
model_dir=result/bpe20000.char.BIOES/checkpoint
#word_emb_dir=sample_data/sample.word.emb

#raw_dir=
#decode_dir=
#dset_dir=
#load_model_dir=
#char_emb_dir=

norm_word_emb=False
norm_char_emb=False
number_normalized=True
seg=True
word_emb_dim=50
char_emb_dim=30

###NetworkConfiguration###
use_crf=True
use_char=True
word_seq_feature=LSTM
char_seq_feature=CNN
#feature=[POS] emb_size=20
#feature=[Cap] emb_size=20
#nbest=1

###TrainingSetting###
status=train
optimizer=SGD
iteration=100
batch_size=32
ave_batch_loss=False

###Hyperparameters###
cnn_layer=4
char_hidden_dim=50
hidden_dim=200
dropout=0.5
lstm_layer=1
bilstm=True
learning_rate=0.015
lr_decay=0.05
momentum=0
l2=1e-8
#gpu
#clip=

test data (head -100):

SOCCER O
- O
JAPAN S-LOC
GET O
LUCKY O
WIN O
, O
CHINA S-PER
IN O
SURPRISE O
DEFEAT O
. O

Nadim B-PER
Ladki E-PER

AL-AIN S-LOC
, O
United B-LOC
Arab I-LOC
Emirates E-LOC
1996-12-06 O

train log :

Seed num: 42
MODEL: train
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Start   Sequence   Laebling   task...
     Tag          scheme: BMES
     Split         token:  |||
     MAX SENTENCE LENGTH: 250
     MAX   WORD   LENGTH: -1
     Number   normalized: True
     Word  alphabet size: 25305
     Char  alphabet size: 78
     Label alphabet size: 18
     Word embedding  dir: None
     Char embedding  dir: None
     Word embedding size: 50
     Char embedding size: 30
     Norm   word     emb: False
     Norm   char     emb: False
     Train  file directory: data/conll2003/en/ner/train.BIOES.txt
     Dev    file directory: data/conll2003/en/ner/valid.BIOES.txt
     Test   file directory: data/conll2003/en/ner/test.BIOES.txt
     Raw    file directory: None
     Dset   file directory: None
     Model  file directory: result/wordbase.char.BIOES/checkpoint
     Loadmodel   directory: None
     Decode file directory: None
     Train instance number: 14041
     Dev   instance number: 3250
     Test  instance number: 3453
     Raw   instance number: 0
     FEATURE num: 0
 ++++++++++++++++++++++++++++++++++++++++
 Model Network:
     Model        use_crf: True
     Model word extractor: LSTM
     Model       use_char: True
     Model char extractor: CNN
     Model char_hidden_dim: 50
 ++++++++++++++++++++++++++++++++++++++++
 Training:
     Optimizer: SGD
     Iteration: 100
     BatchSize: 32
     Average  batch   loss: False
 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.015
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: None
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 200
     Hyper         dropout: 0.5
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build sequence labeling network...
use_char:  True
char feature extractor:  CNN
word feature extractor:  LSTM
use crf:  True
build word sequence feature extractor: LSTM...
build word representation...
build char sequence feature extractor: CNN ...
build CRF...
Epoch: 0/100
 Learning rate is set as: 0.015
Shuffle: first input word list: [16808, 16793, 793, 791, 259]
     Instance: 4000; Time: 21.95s; loss: 89460.7523; acc: 43336.0/58355.0=0.7426
     Instance: 8000; Time: 21.64s; loss: 62598.9025; acc: 87683.0/117335.0=0.7473
     Instance: 12000; Time: 21.70s; loss: 53243.5435; acc: 132374.0/175600.0=0.7538
     Instance: 14041; Time: 10.67s; loss: 27011.2681; acc: 153735.0/203621.0=0.7550
Epoch: 0 training finished. Time: 75.95s, speed: 184.87st/s,  total loss: 232314.4663696289
totalloss: 232314.4663696289
Right token =  42760  All token =  51362  acc =  0.8325220980491413
Dev: time: 4.65s, speed: 707.92st/s; acc: 0.8325, p: 0.4062, r: 0.0022, f: 0.0044
Exceed previous best f score: -10
Save current best model in file: result/wordbase.char.BIOES/checkpoint.0.model
Right token =  38308  All token =  46435  acc =  0.82498115645526
Test: time: 4.09s, speed: 856.69st/s; acc: 0.8250, p: 0.2609, r: 0.0021, f: 0.0042
Epoch: 1/100
 Learning rate is set as: 0.014285714285714285
Shuffle: first input word list: [848, 192, 62, 5572, 6562, 131, 9809, 163, 41, 7800, 5131, 133, 72, 59, 15811, 6342, 6262, 6, 855, 4836, 72, 323, 57, 3834, 144, 13465, 39, 1197, 537, 1562, 303, 57, 2891, 10]
     Instance: 4000; Time: 21.92s; loss: 53488.2458; acc: 44460.0/57848.0=0.7686
     Instance: 8000; Time: 20.83s; loss: 55212.6742; acc: 88519.0/115384.0=0.7672
     Instance: 12000; Time: 21.17s; loss: 54842.6556; acc: 133692.0/173869.0=0.7689
     Instance: 14041; Time: 10.56s; loss: 27046.8551; acc: 157028.0/203621.0=0.7712
Epoch: 1 training finished. Time: 74.48s, speed: 188.52st/s,  total loss: 190590.4307861328
totalloss: 190590.4307861328
Right token =  42762  All token =  51362  acc =  0.8325610373427826
Dev: time: 4.90s, speed: 668.88st/s; acc: 0.8326, p: 0.6667, r: 0.0007, f: 0.0013
Right token =  38327  All token =  46435  acc =  0.8253903305696134
Test: time: 4.45s, speed: 784.95st/s; acc: 0.8254, p: 0.5000, r: 0.0007, f: 0.0014
Epoch: 2/100
 Learning rate is set as: 0.013636363636363634
Shuffle: first input word list: [269, 14]
     Instance: 4000; Time: 21.66s; loss: 55451.4020; acc: 43960.0/57661.0=0.7624
     Instance: 8000; Time: 21.08s; loss: 59527.4275; acc: 87524.0/114649.0=0.7634
     Instance: 12000; Time: 20.77s; loss: 69982.1991; acc: 130951.0/173361.0=0.7554
     Instance: 14041; Time: 10.91s; loss: 29146.8357; acc: 153873.0/203621.0=0.7557
Epoch: 2 training finished. Time: 74.41s, speed: 188.71st/s,  total loss: 214107.8642578125
totalloss: 214107.8642578125
Right token =  42757  All token =  51362  acc =  0.8324636891086795
Dev: time: 4.53s, speed: 724.57st/s; acc: 0.8325, p: 0.1667, r: 0.0002, f: 0.0003
Right token =  38323  All token =  46435  acc =  0.8253041886508022
Test: time: 4.51s, speed: 775.19st/s; acc: 0.8253, p: 0.0000, r: 0.0000, f: -1.0000
Epoch: 3/100
 Learning rate is set as: 0.013043478260869566
Shuffle: first input word list: [1919, 72, 1373, 1920, 14]
     Instance: 4000; Time: 21.04s; loss: 47456.8503; acc: 46273.0/58518.0=0.7907
     Instance: 8000; Time: 21.17s; loss: 51338.7653; acc: 90825.0/116563.0=0.7792
     Instance: 12000; Time: 20.69s; loss: 49514.3057; acc: 135771.0/174081.0=0.7799
     Instance: 14041; Time: 10.49s; loss: 26523.3854; acc: 158471.0/203621.0=0.7783
Epoch: 3 training finished. Time: 73.39s, speed: 191.33st/s,  total loss: 174833.306640625
totalloss: 174833.306640625
Right token =  42752  All token =  51362  acc =  0.8323663408745765
Dev: time: 4.38s, speed: 749.43st/s; acc: 0.8324, p: 0.0714, r: 0.0002, f: 0.0003
Right token =  38315  All token =  46435  acc =  0.8251319048131797
Test: time: 3.18s, speed: 1101.28st/s; acc: 0.8251, p: 0.0435, r: 0.0002, f: 0.0004
Epoch: 4/100
 Learning rate is set as: 0.0125
Shuffle: first input word list: [7531, 14]
     Instance: 4000; Time: 21.25s; loss: 45186.5659; acc: 45306.0/57892.0=0.7826
     Instance: 8000; Time: 21.48s; loss: 47879.9988; acc: 90799.0/116218.0=0.7813
     Instance: 12000; Time: 20.97s; loss: 47165.0463; acc: 136038.0/173760.0=0.7829
     Instance: 14041; Time: 10.98s; loss: 25187.9487; acc: 159160.0/203621.0=0.7816
Epoch: 4 training finished. Time: 74.67s, speed: 188.04st/s,  total loss: 165419.5596923828
totalloss: 165419.5596923828
Right token =  42754  All token =  51362  acc =  0.8324052801682178
Dev: time: 3.57s, speed: 921.35st/s; acc: 0.8324, p: 0.2143, r: 0.0005, f: 0.0010
Right token =  38301  All token =  46435  acc =  0.8248304080973403
Test: time: 3.58s, speed: 983.20st/s; acc: 0.8248, p: 0.1515, r: 0.0009, f: 0.0018
Epoch: 5/100
 Learning rate is set as: 0.012
Shuffle: first input word list: [59, 135, 1594, 6966, 35, 259, 94, 2492, 87, 90, 39, 793, 1596, 4009, 10]
     Instance: 4000; Time: 20.87s; loss: 47385.0153; acc: 45626.0/58632.0=0.7782
     Instance: 8000; Time: 21.45s; loss: 44408.2977; acc: 90903.0/115930.0=0.7841
     Instance: 12000; Time: 20.68s; loss: 46127.7147; acc: 136152.0/173487.0=0.7848
     Instance: 14041; Time: 10.89s; loss: 24793.5476; acc: 159487.0/203621.0=0.7833
Epoch: 5 training finished. Time: 73.89s, speed: 190.03st/s,  total loss: 162714.5753173828
totalloss: 162714.5753173828
Right token =  42760  All token =  51362  acc =  0.8325220980491413
Dev: time: 4.05s, speed: 813.70st/s; acc: 0.8325, p: 0.5000, r: 0.0003, f: 0.0007
Right token =  38331  All token =  46435  acc =  0.8254764724884247
Test: time: 3.78s, speed: 923.25st/s; acc: 0.8255, p: 0.7273, r: 0.0014, f: 0.0028
Epoch: 6/100
 Learning rate is set as: 0.011538461538461537
Shuffle: first input word list: [2822, 259, 259, 2820, 259, 1006]
     Instance: 4000; Time: 21.46s; loss: 45836.3317; acc: 45602.0/57898.0=0.7876
     Instance: 8000; Time: 20.92s; loss: 40953.8358; acc: 90251.0/114021.0=0.7915
     Instance: 12000; Time: 20.88s; loss: 50871.9891; acc: 137183.0/173830.0=0.7892
     Instance: 14041; Time: 10.71s; loss: 25519.3202; acc: 160047.0/203621.0=0.7860
Epoch: 6 training finished. Time: 73.96s, speed: 189.85st/s,  total loss: 163181.47680664062
totalloss: 163181.47680664062
Right token =  42759  All token =  51362  acc =  0.8325026284023208
Dev: time: 4.37s, speed: 753.98st/s; acc: 0.8325, p: 0.4000, r: 0.0003, f: 0.0007
Right token =  38329  All token =  46435  acc =  0.825433401529019
Test: time: 3.93s, speed: 887.74st/s; acc: 0.8254, p: 0.7500, r: 0.0011, f: 0.0021
Epoch: 7/100
 Learning rate is set as: 0.01111111111111111
Shuffle: first input word list: [4516, 259, 259, 657, 657, 513, 513, 259]
     Instance: 4000; Time: 21.09s; loss: 43271.3885; acc: 47082.0/58467.0=0.8053
     Instance: 8000; Time: 21.53s; loss: 45434.3374; acc: 92604.0/116288.0=0.7963
     Instance: 12000; Time: 20.87s; loss: 44401.8308; acc: 138572.0/174221.0=0.7954
     Instance: 14041; Time: 10.73s; loss: 20354.4045; acc: 162068.0/203621.0=0.7959
Epoch: 7 training finished. Time: 74.23s, speed: 189.15st/s,  total loss: 153461.96130371094
totalloss: 153461.96130371094
Right token =  42754  All token =  51362  acc =  0.8324052801682178
Dev: time: 3.75s, speed: 879.98st/s; acc: 0.8324, p: 0.2000, r: 0.0005, f: 0.0010
Right token =  38318  All token =  46435  acc =  0.8251965112522881
Test: time: 3.95s, speed: 883.15st/s; acc: 0.8252, p: 0.4286, r: 0.0016, f: 0.0032
Epoch: 8/100
 Learning rate is set as: 0.010714285714285714
Shuffle: first input word list: [83, 18, 474, 237, 11590, 484, 237, 41, 1338, 250, 57, 259, 260, 3554, 3555, 203, 21, 57, 5553, 6, 1181, 363, 8919, 10]
     Instance: 4000; Time: 20.31s; loss: 40530.5502; acc: 47295.0/58444.0=0.8092
     Instance: 8000; Time: 21.16s; loss: 43322.9246; acc: 93888.0/117047.0=0.8021
     Instance: 12000; Time: 22.01s; loss: 43372.3680; acc: 139805.0/174641.0=0.8005
     Instance: 14041; Time: 10.78s; loss: 22910.1526; acc: 162543.0/203621.0=0.7983
Epoch: 8 training finished. Time: 74.26s, speed: 189.08st/s,  total loss: 150135.99536132812
totalloss: 150135.99536132812
Right token =  42751  All token =  51362  acc =  0.832346871227756
Dev: time: 4.28s, speed: 770.60st/s; acc: 0.8323, p: 0.0000, r: 0.0000, f: -1.0000
Right token =  38315  All token =  46435  acc =  0.8251319048131797
Test: time: 4.37s, speed: 802.09st/s; acc: 0.8251, p: 0.0526, r: 0.0002, f: 0.0004
Epoch: 9/100
 Learning rate is set as: 0.010344827586206896
Shuffle: first input word list: [1870, 6715, 6716, 131, 6717, 133, 579]
     Instance: 4000; Time: 20.10s; loss: 42776.5460; acc: 46757.0/58147.0=0.8041
     Instance: 8000; Time: 21.18s; loss: 45972.1129; acc: 92088.0/115892.0=0.7946
     Instance: 12000; Time: 21.80s; loss: 39922.2743; acc: 139043.0/173857.0=0.7998
     Instance: 14041; Time: 10.79s; loss: 20447.1543; acc: 163247.0/203621.0=0.8017
Epoch: 9 training finished. Time: 73.87s, speed: 190.07st/s,  total loss: 149118.08752441406
totalloss: 149118.08752441406
Right token =  42754  All token =  51362  acc =  0.8324052801682178
Dev: time: 4.08s, speed: 807.30st/s; acc: 0.8324, p: 0.0000, r: 0.0000, f: -1.0000
Right token =  38322  All token =  46435  acc =  0.8252826531710994
Test: time: 4.20s, speed: 829.47st/s; acc: 0.8253, p: 0.0909, r: 0.0002, f: 0.0004
Epoch: 10/100
 Learning rate is set as: 0.01
Shuffle: first input word list: [2567, 39, 513, 1858, 6605]
     Instance: 4000; Time: 20.74s; loss: 44515.7240; acc: 45471.0/57824.0=0.7864
     Instance: 8000; Time: 21.38s; loss: 43261.1500; acc: 91942.0/116208.0=0.7912
     Instance: 12000; Time: 20.97s; loss: 41170.6157; acc: 138384.0/173883.0=0.7958
     Instance: 14041; Time: 10.84s; loss: 19026.5566; acc: 162906.0/203621.0=0.8000
Epoch: 10 training finished. Time: 73.93s, speed: 189.93st/s,  total loss: 147974.04638671875
totalloss: 147974.04638671875
Right token =  42759  All token =  51362  acc =  0.8325026284023208
Dev: time: 4.90s, speed: 668.88st/s; acc: 0.8325, p: 0.5000, r: 0.0003, f: 0.0007
Right token =  38329  All token =  46435  acc =  0.825433401529019
Test: time: 3.81s, speed: 916.40st/s; acc: 0.8254, p: 0.7500, r: 0.0011, f: 0.0021
jiesutd commented 4 years ago

That's weird. Seems like the model didn' train well, how about set batch_size=10, have you tried this?

nnakamura3 commented 4 years ago

oh, I forgot changing batch_size. I tried batch_size=10 and found that the model was working properly. Thank you for your advice!