Closed precision2intelligence closed 6 years ago
Please provide me with your log and sample data. Thanks.
Please find them in your e-mail! Thank you!
I checked your log file. There are three main problems:
Based on your sample data, you are using the Chinese character-based model. In this case, the input word embeddings
should be character embeddings
. And you can set use_char=False
as your basic unit is already characters.
Your input word embeddings
(should be your character embeddings
) has a very large OOV (>99%), it will heavily affect your system performance.
Your dev/test dataset is too small (100~200 sentences). It is not surprising that you got F1= -1 with the previous incorrect settings.
The word embedding is trained by segmenting the corpus into words. Such as "长江大桥". The char embedding is as "长", "江","大","桥". The word embedding is not needed at this time?
I have changed the word embedding as you said. The code can only find English characters. I print the embedding dict that match the corpus, and find only English characters are matched. But both the dataset and embedding files are Chinese. This lead to the large OOV. Is this tool only suit English corpus? Here is the log file. The embedding file is same with we used in your Lattice model. It can work well in that model.
Seed num: 42 MODEL: train Load pretrained word embedding, norm: False, dir: /home//newsingle50qing.txt k l n o p r s t u v w a b c y d z e f g q h ~ i ? j Embedding: pretrain word:1282, prefect match:26, case_match:0, oov:1950, oov%:0.986342943854 Training model... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DATA SUMMARY START: I/O: Tag scheme: BMES MAX SENTENCE LENGTH: -1 MAX WORD LENGTH: -1 Number normalized: True Word alphabet size: 1977 Char alphabet size: 100 Label alphabet size: 27 Word embedding dir: /home//newsingle50qing.txt Char embedding dir: None Word embedding size: 50 Char embedding size: 30 Norm word emb: False Norm char emb: False Train file directory: /home//train.txt Dev file directory: /home//dev.txt Test file directory: /home//test.txt Raw file directory: None Dset file directory: /home//model/ Model file directory: /home/*/model/ Loadmodel directory: None Decode file directory: None Train instance number: 20519 Dev instance number: 2125 Test instance number: 2320 Raw instance number: 0 FEATURE num: 0 ++++++++++++++++++++++++++++++++++++++++ Model Network: Model use_crf: True Model word extractor: LSTM Model use_char: False ++++++++++++++++++++++++++++++++++++++++ Training: Optimizer: SGD Iteration: 50 BatchSize: 16 Average batch loss: False ++++++++++++++++++++++++++++++++++++++++ Hyperparameters: Hyper lr: 0.015 Hyper lr_decay: 0.05 Hyper HP_clip: None Hyper momentum: 0.0 Hyper l2: 1e-08 Hyper hidden_dim: 200 Hyper dropout: 0.5 Hyper lstm_layer: 1 Hyper bilstm: True Hyper GPU: True DATA SUMMARY END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ build network... use_char: False word feature extractor: LSTM use crf: True build word sequence feature extractor: LSTM... build word representation... build CRF... Epoch: 0/50 Learning rate is setted as: 0.015 Instance: 2000; Time: 36.69s; loss: 28537.5034; acc: 50014.0/51917.0=0.9633 Instance: 4000; Time: 38.72s; loss: 8965.4812; acc: 102685.0/105929.0=0.9694
It can be used in Chinese. My lattice LSTM have several embeddings, which one do you use? If you use the character embeddings, then I guess it may have some Character encoding mismatch in your train/dev/test data.
I have followed your comments. The f1 score is still -1. Please help us to find the problems. The log file is as follows.
Seed num: 42
MODEL: train
Load pretrained word embedding, norm: False, dir: /home/*/newsingle100qing.txt
Embedding:
pretrain word:1282, prefect match:1282, case_match:0, oov:694, oov%:0.351036924633
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
I/O:
Tag scheme: BMES
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: True
Word alphabet size: 1977
Char alphabet size: 100
Label alphabet size: 27
Word embedding dir: /home/*/newsingle100qing.txt
Char embedding dir: None
Word embedding size: 100
Char embedding size: 30
Norm word emb: False
Norm char emb: False
Train file directory: /home/*/train.txt
Dev file directory: /home/*/dev.txt
Test file directory: /home/*/test.txt
Raw file directory: None
Dset file directory: /home/*/model/
Model file directory: /home/*/model
Loadmodel directory: None
Decode file directory: None
Train instance number: 20390
Dev instance number: 2121
Test instance number: 2310
Raw instance number: 0
FEATURE num: 0
++++++++++++++++++++++++++++++++++++++++
Model Network:
Model use_crf: True
Model word extractor: LSTM
Model use_char: False
++++++++++++++++++++++++++++++++++++++++
Training:
Optimizer: Adam
Iteration: 50
BatchSize: 16
Average batch loss: False
++++++++++++++++++++++++++++++++++++++++
Hyperparameters:
Hyper lr: 0.015
Hyper lr_decay: 0.05
Hyper HP_clip: None
Hyper momentum: 0.0
Hyper l2: 1e-08
Hyper hidden_dim: 200
Hyper dropout: 0.5
Hyper lstm_layer: 1
Hyper bilstm: True
Hyper GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build network...
use_char: False
word feature extractor: LSTM
use crf: True
build word sequence feature extractor: LSTM...
build word representation...
build CRF...
Epoch: 0/50
Instance: 2000; Time: 140.87s; loss: 11691.8186; acc: 45882.0/47404.0=0.9679
Instance: 4000; Time: 140.68s; loss: 7558.7781; acc: 88926.0/91768.0=0.9690
Instance: 6000; Time: 128.61s; loss: 5567.9442; acc: 135566.0/139719.0=0.9703
Instance: 8000; Time: 125.62s; loss: 4629.0814; acc: 181789.0/187174.0=0.9712
Instance: 10000; Time: 120.99s; loss: 4022.0119; acc: 230511.0/237038.0=0.9725
Instance: 12000; Time: 153.70s; loss: 4401.6414; acc: 278995.0/286922.0=0.9724
Instance: 14000; Time: 133.53s; loss: 4090.0612; acc: 323919.0/333180.0=0.9722
Instance: 16000; Time: 152.85s; loss: 4675.8454; acc: 372036.0/382784.0=0.9719
Instance: 18000; Time: 123.16s; loss: 4129.2788; acc: 418833.0/430928.0=0.9719
Instance: 20000; Time: 121.94s; loss: 3834.2694; acc: 466650.0/480024.0=0.9721
Instance: 20390; Time: 30.99s; loss: 867.0348; acc: 476172.0/489836.0=0.9721
Epoch: 0 training finished. Time: 1372.94s, speed: 14.85st/s, total loss: 55467.7651672
totalloss: 55467.7651672
gold_num = 424 pred_num = 0 right_num = 0
Dev: time: 35.98s, speed: 59.17st/s; acc: 0.9729, p: -1.0000, r: 0.0000, f: -1.0000
Exceed previous best f score: -10
Save current best model in file: /home/yangqiuxia/WCNNNCRFpp/model.0.model
gold_num = 472 pred_num = 0 right_num = 0
Test: time: 43.23s, speed: 53.74st/s; acc: 0.9767, p: -1.0000, r: 0.0000, f: -1.0000
You can find that the token accuracy has been largely improved from ~80% to 97%. Your result is the first iteration, you can use more iterations to give a better result.
And it is strange that your dev/test dataset have more than 2000 sentences but only contain ~400 entities. This means that your entity is too little, which is difficult to identify by system.
We waited for the result after enough epochs and this phenomenon is still existed. We used the same dataset in your Lattice model, it worked well. Besides, the dataset also worked well on other DNN models. So I have no idea of this. Is there any clue?
Then it may be the unbalanced label problem. As your dataset has very less entities, model will ignore the less labels. You can find some solutions for unbalanced data, but they are not perfect. It is a typical problem in real application.
It may not be the unbalanced label problem. We have changed another dataset, but the phenomenon is still exist. Can you help us? Here is the log file: Seed num: 42 MODEL: train Load pretrained word embedding, norm: False, dir: /home//.model.txt Embedding: pretrain word:331830, prefect match:3730, case_match:0, oov:673, oov%:0.152815622162 Training model... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DATA SUMMARY START: I/O: Tag scheme: BIO MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Word alphabet size: 4404 Char alphabet size: 105 Label alphabet size: 8 Word embedding dir: /home//model.txt Char embedding dir: None Word embedding size: 100 Char embedding size: 30 Norm word emb: False Norm char emb: False Train file directory: /home//my.train Dev file directory: /home//my.dev Test file directory: /home//my.test Raw file directory: None Dset file directory: /home//model/ Model file directory: /home/*/model Loadmodel directory: None Decode file directory: None Train instance number: 16628 Dev instance number: 4208 Test instance number: 4628 Raw instance number: 0 FEATURE num: 0 ++++++++++++++++++++++++++++++++++++++++ Model Network: Model use_crf: True Model word extractor: LSTM Model use_char: False ++++++++++++++++++++++++++++++++++++++++ Training: Optimizer: SGD Iteration: 50 BatchSize: 16 Average batch loss: False ++++++++++++++++++++++++++++++++++++++++ Hyperparameters: Hyper lr: 0.015 Hyper lr_decay: 0.05 Hyper HP_clip: None Hyper momentum: 0.0 Hyper l2: 1e-08 Hyper hidden_dim: 200 Hyper dropout: 0.5 Hyper lstm_layer: 1 Hyper bilstm: True Hyper GPU: True DATA SUMMARY END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ build network... use_char: False word feature extractor: LSTM use crf: True build word sequence feature extractor: LSTM... build word representation... build CRF... Epoch: 0/50 Learning rate is setted as: 0.015 Instance: 2000; Time: 62.56s; loss: 1508746.1278; acc: 73698.0/93881.0=0.7850 Instance: 4000; Time: 55.43s; loss: 1649292.3662; acc: 148086.0/187737.0=0.7888 Instance: 6000; Time: 46.77s; loss: 1349588.6406; acc: 221823.0/280404.0=0.7911 Instance: 8000; Time: 50.71s; loss: 1488015.7256; acc: 294780.0/373006.0=0.7903 Instance: 10000; Time: 51.54s; loss: 1311897.2812; acc: 369806.0/467554.0=0.7909 Instance: 12000; Time: 46.40s; loss: 1151453.0967; acc: 443310.0/559870.0=0.7918 Instance: 14000; Time: 44.21s; loss: 1001844.0859; acc: 518360.0/653931.0=0.7927 Instance: 16000; Time: 46.27s; loss: 1094421.0596; acc: 589905.0/744602.0=0.7922 Instance: 16628; Time: 13.23s; loss: 282753.6875; acc: 612711.0/773637.0=0.7920 Epoch: 0 training finished. Time: 417.13s, speed: 39.86st/s, total loss: 10838012.0712 totalloss: 10838012.0712 gold_num = 6540 pred_num = 0 right_num = 0 Dev: time: 34.64s, speed: 123.72st/s; acc: 0.8918, p: -1.0000, r: 0.0000, f: -1.0000 Exceed previous best f score: -10 Save current best model in file: /home/yangqiuxia/WCNNNCRFpp/model.0.model gold_num = 7455 pred_num = 4 right_num = 0 Test: time: 35.72s, speed: 130.93st/s; acc: 0.8866, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 1/50 Learning rate is setted as: 0.0142857142857 Instance: 2000; Time: 49.18s; loss: 1157811.4463; acc: 74168.0/94638.0=0.7837 Instance: 4000; Time: 43.51s; loss: 797654.6699; acc: 147283.0/185909.0=0.7922 Instance: 6000; Time: 44.03s; loss: 950858.7529; acc: 221901.0/280555.0=0.7909 Instance: 8000; Time: 42.80s; loss: 985196.9434; acc: 294712.0/373338.0=0.7894 Instance: 10000; Time: 41.63s; loss: 820030.0908; acc: 368879.0/466311.0=0.7911 Instance: 12000; Time: 42.12s; loss: 907943.2744; acc: 442761.0/558860.0=0.7923 Instance: 14000; Time: 45.02s; loss: 866118.1777; acc: 516136.0/651572.0=0.7921 Instance: 16000; Time: 39.56s; loss: 874607.4980; acc: 590281.0/744204.0=0.7932 Instance: 16628; Time: 13.17s; loss: 321950.6758; acc: 613311.0/773637.0=0.7928 Epoch: 1 training finished. Time: 361.01s, speed: 46.06st/s, total loss: 7682171.5293 totalloss: 7682171.5293 gold_num = 6540 pred_num = 7 right_num = 0 Dev: time: 32.92s, speed: 130.17st/s; acc: 0.8918, p: 0.0000, r: 0.0000, f: -1.0000 gold_num = 7455 pred_num = 12 right_num = 0 Test: time: 38.22s, speed: 123.80st/s; acc: 0.8865, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 2/50 Learning rate is setted as: 0.0136363636364 Instance: 2000; Time: 62.19s; loss: 802904.1162; acc: 75197.0/94685.0=0.7942 Instance: 4000; Time: 61.69s; loss: 797079.1963; acc: 150347.0/188708.0=0.7967 Instance: 6000; Time: 44.54s; loss: 780471.0654; acc: 224016.0/281337.0=0.7963 Instance: 8000; Time: 48.63s; loss: 797713.7139; acc: 296610.0/374062.0=0.7929 Instance: 10000; Time: 57.54s; loss: 780577.4780; acc: 369560.0/466573.0=0.7921 Instance: 12000; Time: 54.08s; loss: 855878.4482; acc: 443217.0/560055.0=0.7914 Instance: 14000; Time: 45.68s; loss: 593563.6289; acc: 517002.0/651865.0=0.7931 Instance: 16000; Time: 41.91s; loss: 782002.7080; acc: 590147.0/744541.0=0.7926 Instance: 16628; Time: 13.48s; loss: 205705.3845; acc: 613215.0/773637.0=0.7926 Epoch: 2 training finished. Time: 429.74s, speed: 38.69st/s, total loss: 6395895.7395 totalloss: 6395895.7395 gold_num = 6540 pred_num = 17 right_num = 0 Dev: time: 43.97s, speed: 96.40st/s; acc: 0.8911, p: 0.0000, r: 0.0000, f: -1.0000 gold_num = 7455 pred_num = 25 right_num = 0 Test: time: 48.79s, speed: 96.24st/s; acc: 0.8849, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 3/50 Learning rate is setted as: 0.0130434782609 Instance: 2000; Time: 59.16s; loss: 651076.3457; acc: 73847.0/93889.0=0.7865 Instance: 4000; Time: 59.52s; loss: 645221.7754; acc: 146804.0/185729.0=0.7904 Instance: 6000; Time: 54.23s; loss: 580582.8809; acc: 222166.0/280522.0=0.7920 Instance: 8000; Time: 58.80s; loss: 550626.7510; acc: 294617.0/371781.0=0.7924 Instance: 10000; Time: 48.71s; loss: 534637.1660; acc: 367575.0/462536.0=0.7947 Instance: 12000; Time: 53.63s; loss: 589357.3496; acc: 441536.0/555878.0=0.7943 Instance: 14000; Time: 47.10s; loss: 637078.3643; acc: 517582.0/650469.0=0.7957 Instance: 16000; Time: 52.80s; loss: 651482.7441; acc: 590662.0/743960.0=0.7939 Instance: 16628; Time: 19.69s; loss: 192460.6211; acc: 614139.0/773637.0=0.7938 Epoch: 3 training finished. Time: 453.64s, speed: 36.65st/s, total loss: 5032523.99805 totalloss: 5032523.99805 gold_num = 6540 pred_num = 0 right_num = 0 Dev: time: 40.17s, speed: 106.54st/s; acc: 0.8918, p: -1.0000, r: 0.0000, f: -1.0000 gold_num = 7455 pred_num = 0 right_num = 0 Test: time: 46.26s, speed: 100.89st/s; acc: 0.8866, p: -1.0000, r: 0.0000, f: -1.0000 Epoch: 4/50 Learning rate is setted as: 0.0125
Your loss exploded (loss: 1508746.1278
). You can try to set the ave_batch_loss=True
first.
The same issue as 22#. We use our dataset to train the NER model. The tag scheme is BIOES (The only difference is we used "M-" instead of "I-"). These data have been test on your "Lattice LSTM model". They can get accurate p,r,f1 value. So I am confused about this. Why our f1 score is -1 and pred_num = 0 on this model?