jiesutd / LatticeLSTM

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.
1.8k stars 453 forks source link

您好total loss很大,大概是什么原因 #103

Closed TianlinZhang668 closed 4 years ago

TianlinZhang668 commented 4 years ago

您好,我用的自带的weibo.conll的数据进行训练,参照84#也统一了一下参数,但是好久不能收敛,同时loss好几十万,您知道大概什么参数美调对么。谢谢,log如下: CuDNN: True GPU available: True Status: train Seg: True Train file: data/weiboNER.conll.train.txt Dev file: data/weiboNER.conll.dev.txt Test file: data/weiboNER.conll.test.txt Raw file: None Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec Bichar emb: None Gaz file: data/ctb.50d.vec Model saved to: data/model/weibo. Load gaz file: data/ctb.50d.vec total size: 704368 gaz alphabet size: 10798 gaz alphabet size: 12235 gaz alphabet size: 13671 build word pretrain emb... Embedding: pretrain word:11327, prefect match:3281, case_match:0, oov:79, oov%:0.023504909253198453 build biword pretrain emb... Embedding: pretrain word:0, prefect match:0, case_match:0, oov:42651, oov%:0.999976554440589 build gaz pretrain emb... Embedding: pretrain word:704368, prefect match:13669, case_match:0, oov:1, oov%:7.31475385853266e-05 Training model... DATA SUMMARY START: Tag scheme: BIO MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Use bigram: False Word alphabet size: 3361 Biword alphabet size: 42652 Char alphabet size: 3358 Gaz alphabet size: 13671 Label alphabet size: 16 Word embedding size: 50 Biword embedding size: 50 Char embedding size: 30 Gaz embedding size: 50 Norm word emb: True Norm biword emb: True Norm gaz emb: False Norm gaz dropout: 0.1 Train instance number: 1350 Dev instance number: 270 Test instance number: 270 Raw instance number: 0 Hyperpara iteration: 100 Hyperpara batch size: 1 Hyperpara lr: 0.015 Hyperpara lr_decay: 0.05 Hyperpara HP_clip: 5.0 Hyperpara momentum: 0 Hyperpara hidden_dim: 200 Hyperpara dropout: 0.5 Hyperpara lstm_layer: 1 Hyperpara bilstm: True Hyperpara GPU: True Hyperpara use_gaz: True Hyperpara fix gaz emb: False Hyperpara usechar: False DATA SUMMARY END. Data setting saved to file: ResumeNER/save.dset build batched lstmcrf... build batched bilstm... build LatticeLSTM... forward , Fix emb: False gaz drop: 0.1 /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:104: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal. init.orthogonal(self.weightih.data) load pretrain word emb... (13671, 50) /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:105: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal. init.orthogonal(self.alpha_weightih.data) /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:117: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant. init.constant(self.bias.data, val=0) /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:118: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.alphabias.data, val=0) /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:37: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal. init.orthogonal(self.weightih.data) /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:43: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant. init.constant(self.bias.data, val=0) build LatticeLSTM... backward , Fix emb: False gaz drop: 0.1 load pretrain word emb... (13671, 50) build batched crf... finished built model. Epoch: 0/100 Learning rate is setted as: 0.015 Instance: 1350; Time: 152.99s; loss: 4618066.5410; acc: 57990.0/73780.0=0.7860 Epoch: 0 training finished. Time: 152.99s, speed: 8.82st/s, total loss: 4618066.541015625 /home/ztl/PycharmProjects/latticeLSTM/main.py:218: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. char_seq_tensor = autograd.Variable(torch.zeros((batch_size, max_seq_len, max_word_len)), volatile = volatile_flag).long() /home/ztl/PycharmProjects/latticeLSTM/model/latticelstm.py:261: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. word_var = autograd.Variable(torch.LongTensor(skipinput[t][0]),volatile = volatile_flag) gold_num = 301 pred_num = 9 right_num = 0 Dev: time: 13.57s, speed: 19.92st/s; acc: 0.9338, p: 0.0000, r: 0.0000, f: -1.0000 gold_num = 310 pred_num = 16 right_num = 0 Test: time: 13.97s, speed: 19.35st/s; acc: 0.9361, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 1/100 Learning rate is setted as: 0.014249999999999999 Instance: 1350; Time: 154.49s; loss: 2459108.2500; acc: 67563.0/73780.0=0.9157 Epoch: 1 training finished. Time: 154.49s, speed: 8.74st/s, total loss: 2459108.25 gold_num = 301 pred_num = 26 right_num = 0 Dev: time: 13.57s, speed: 19.92st/s; acc: 0.9338, p: 0.0000, r: 0.0000, f: -1.0000 gold_num = 310 pred_num = 26 right_num = 0 Test: time: 13.86s, speed: 19.50st/s; acc: 0.9371, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 2/100 Learning rate is setted as: 0.0135375 Instance: 1350; Time: 153.91s; loss: 1731192.8750; acc: 67235.0/73780.0=0.9113 Epoch: 2 training finished. Time: 153.91s, speed: 8.77st/s, total loss: 1731192.875 gold_num = 301 pred_num = 23 right_num = 2 Dev: time: 13.63s, speed: 19.83st/s; acc: 0.9378, p: 0.0870, r: 0.0066, f: 0.0123 Exceed previous best f score: -1 gold_num = 310 pred_num = 19 right_num = 0 Test: time: 13.88s, speed: 19.48st/s; acc: 0.9420, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 3/100 Learning rate is setted as: 0.012860624999999997 Instance: 1350; Time: 154.79s; loss: 1379898.4375; acc: 67332.0/73780.0=0.9126 Epoch: 3 training finished. Time: 154.79s, speed: 8.72st/s, total loss: 1379898.4375 gold_num = 301 pred_num = 14 right_num = 0 Dev: time: 13.75s, speed: 19.65st/s; acc: 0.9397, p: 0.0000, r: 0.0000, f: -1.0000 gold_num = 310 pred_num = 12 right_num = 0 Test: time: 14.25s, speed: 18.97st/s; acc: 0.9418, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 4/100 Learning rate is setted as: 0.012217593749999998 Instance: 1350; Time: 152.44s; loss: 1097173.6875; acc: 67249.0/73780.0=0.9115 Epoch: 4 training finished. Time: 152.44s, speed: 8.86st/s, total loss: 1097173.6875 gold_num = 301 pred_num = 27 right_num = 1 Dev: time: 13.43s, speed: 20.13st/s; acc: 0.9393, p: 0.0370, r: 0.0033, f: 0.0061 gold_num = 310 pred_num = 23 right_num = 0 Test: time: 13.85s, speed: 19.52st/s; acc: 0.9411, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 5/100 Learning rate is setted as: 0.011606714062499995 Instance: 1350; Time: 153.71s; loss: 928970.1250; acc: 67332.0/73780.0=0.9126 Epoch: 5 training finished. Time: 153.71s, speed: 8.78st/s, total loss: 928970.125 gold_num = 301 pred_num = 32 right_num = 1 Dev: time: 13.67s, speed: 19.77st/s; acc: 0.9396, p: 0.0312, r: 0.0033, f: 0.0060 gold_num = 310 pred_num = 24 right_num = 0 Test: time: 13.99s, speed: 19.31st/s; acc: 0.9411, p: 0.0000, r: 0.0000, f: -1.0000 Epoch: 6/100 Learning rate is setted as: 0.011026378359374997 Instance: 1350; Time: 155.16s; loss: 815103.9062; acc: 67232.0/73780.0=0.9112 Epoch: 6 training finished. Time: 155.16s, speed: 8.70st/s, total loss: 815103.90625 gold_num = 301 pred_num = 22 right_num = 1 Dev: time: 13.47s, speed: 20.06st/s; acc: 0.9396, p: 0.0455, r: 0.0033, f: 0.0062 gold_num = 310 pred_num = 10 right_num = 0 Test: time: 13.82s, speed: 19.55st/s; acc: 0.9430, p: 0.0000, r: 0.0000, f: -1.0000

jiesutd commented 4 years ago
  1. 可以把格式BIO 转成BIOES 试试。 转换脚本在这: https://github.com/jiesutd/NCRFpp/blob/master/utils/tagSchemeConverter.py

  2. log 里面有很多warning, 我怀疑梯度爆炸和pytorch 版本有关系,你可以试试用推荐的pytorch 版本试试