损失爆炸的问题 - Githubissues

lhjner commented 5 years ago

您好，非常感谢您能分享代码，但是我在用您的代码时出现一点问题，希望您能帮忙解答。我使用的是conll2002西班牙语的数据集，词向量是从MUSE中下载的 fastText Wikipedia embeddings。在跑CNN+LSTM+CRF时发生了梯度爆炸，但是跑CLSTM+LSTM+CRF和GRU+LSTM+CRF就没有问题，想请问一下，为什么会这样，这是由什么原因造成的？非常感谢！ Seed num: 42 MODEL: train Find feature: NP] Load pretrained word embedding, norm: False, dir: sample_data/wiki.es.vec Embedding: pretrain word:983630, prefect match:18012, case_match:8758, oov:1742, oov%:0.0610949391505629 Training model... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DATA SUMMARY START: I/O: Start Sequence Laebling task... Tag scheme: BIO Split token: ||| MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Word alphabet size: 28513 Char alphabet size: 84 Label alphabet size: 10 Word embedding dir: sample_data/wiki.es.vec Char embedding dir: None Word embedding size: 300 Char embedding size: 30 Norm word emb: False Norm char emb: False Train file directory: sample_data/esp.train Dev file directory: sample_data/esp.testa Test file directory: sample_data/esp.testb Raw file directory: None Dset file directory: None Model file directory: sample_data/lstmcrf Loadmodel directory: None Decode file directory: None Train instance number: 8320 Dev instance number: 1915 Test instance number: 1517 Raw instance number: 0 FEATURE num: 1 Fe: NP] alphabet size: 62 Fe: NP] embedding dir: None Fe: NP] embedding size: 20 Fe: NP] norm emb: False ++++++++++++++++++++++++++++++++++++++++ Model Network: Model use_crf: True Model word extractor: LSTM Model use_char: True Model char extractor: CNN Model char_hidden_dim: 50 ++++++++++++++++++++++++++++++++++++++++ Training: Optimizer: SGD Iteration: 100 BatchSize: 10 Average batch loss: False ++++++++++++++++++++++++++++++++++++++++ Hyperparameters: Hyper lr: 0.015 Hyper lr_decay: 0.05 Hyper HP_clip: None Hyper momentum: 0.0 Hyper l2: 1e-08 Hyper hidden_dim: 200 Hyper dropout: 0.5 Hyper lstm_layer: 1 Hyper bilstm: True Hyper GPU: False DATA SUMMARY END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ build sequence labeling network... use_char: True char feature extractor: CNN word feature extractor: LSTM use crf: True build word sequence feature extractor: LSTM... build word representation... build char sequence feature extractor: CNN ... build CRF... Epoch: 0/100 Learning rate is set as: 0.015 Shuffle: first input word list: [112, 122, 37, 86, 39, 29, 123, 23, 21, 124, 6, 86, 39, 37, 56, 125, 21, 126, 127, 6, 128, 129, 37, 21, 130, 131, 132, 133, 134, 86, 135, 68, 136, 137, 65, 39, 51, 10] Instance: 500; Time: 17.48s; loss: 66744.3432; acc: 12547.0/16152.0=0.7768 Instance: 1000; Time: 17.76s; loss: 36117.9189; acc: 25307.0/32388.0=0.7814 Instance: 1500; Time: 18.82s; loss: 33063.2673; acc: 38117.0/48536.0=0.7853 Instance: 2000; Time: 17.73s; loss: 22335.1410; acc: 49616.0/63012.0=0.7874 Instance: 2500; Time: 18.96s; loss: 23658.9066; acc: 62782.0/79203.0=0.7927 Instance: 3000; Time: 18.44s; loss: 23325.6564; acc: 75591.0/95059.0=0.7952 Instance: 3500; Time: 18.57s; loss: 25712.5104; acc: 87833.0/110454.0=0.7952 Instance: 4000; Time: 19.91s; loss: 22649.9701; acc: 100420.0/126189.0=0.7958 Instance: 4500; Time: 19.22s; loss: 24327.3322; acc: 112567.0/141422.0=0.7960 Instance: 5000; Time: 19.85s; loss: 26658.7230; acc: 124379.0/156725.0=0.7936 Instance: 5500; Time: 18.37s; loss: 25029.7343; acc: 136149.0/172008.0=0.7915 Instance: 6000; Time: 18.65s; loss: 29637.6336; acc: 149560.0/188728.0=0.7925 Instance: 6500; Time: 19.05s; loss: 24267.2758; acc: 161762.0/204421.0=0.7913 Instance: 7000; Time: 19.61s; loss: 22461.7405; acc: 175534.0/221536.0=0.7923 Instance: 7500; Time: 18.50s; loss: 22996.7426; acc: 187587.0/237142.0=0.7910 Instance: 8000; Time: 18.71s; loss: 21329.9551; acc: 199753.0/252760.0=0.7903 Instance: 8320; Time: 12.32s; loss: 12823.4778; acc: 208061.0/262902.0=0.7914 Epoch: 0 training finished. Time: 311.95s, speed: 26.67st/s, total loss: 463140.3287963867 totalloss: 463140.3287963867 Right token = 10789 All token = 52923 acc = 0.20386221491601006 Dev: time: 14.95s, speed: 128.68st/s; acc: 0.2039, p: 0.0070, r: 0.0097, f: 0.0081 Exceed previous best f score: -10 Save current best model in file: sample_data/lstmcrf.0.model Right token = 9090 All token = 51533 acc = 0.17639182659654978 Test: time: 13.96s, speed: 110.87st/s; acc: 0.1764, p: 0.0035, r: 0.0059, f: 0.0044 Epoch: 1/100 Learning rate is set as: 0.014285714285714285 Shuffle: first input word list: [4192, 33, 41, 5335, 102, 3068, 6, 41, 16619, 8961, 16625, 33, 8284, 157, 7171, 23, 8823, 157, 3, 850, 235, 304, 5, 62, 12097, 109, 7593, 235, 39, 157, 8307, 23, 21, 900, 157, 3, 850, 235, 304, 5, 6, 1404, 23, 361, 23, 21, 16399, 3, 850, 235, 304, 5, 62, 37, 114, 16626, 261, 16627, 3, 850, 235, 304, 5, 6, 16628, 39, 16623, 10] Instance: 500; Time: 20.07s; loss: 22066.5989; acc: 12368.0/15718.0=0.7869 Instance: 1000; Time: 19.20s; loss: 15910.0819; acc: 25232.0/31352.0=0.8048 Instance: 1500; Time: 19.42s; loss: 16232.9181; acc: 37775.0/46995.0=0.8038 Instance: 2000; Time: 19.53s; loss: 14678.2134; acc: 51808.0/63706.0=0.8132 Instance: 2500; Time: 19.44s; loss: 15756.1388; acc: 64652.0/79451.0=0.8137 Instance: 3000; Time: 19.07s; loss: 18944.6857; acc: 77324.0/95315.0=0.8112 Instance: 3500; Time: 18.91s; loss: 15556.8204; acc: 89668.0/110505.0=0.8114 Instance: 4000; Time: 18.46s; loss: 18086.7941; acc: 101804.0/125771.0=0.8094 Instance: 4500; Time: 19.44s; loss: 16134.2563; acc: 114591.0/141392.0=0.8104 Instance: 5000; Time: 19.51s; loss: 19310.2681; acc: 126960.0/157040.0=0.8085 Instance: 5500; Time: 19.66s; loss: 20971.3447; acc: 139678.0/173303.0=0.8060 Instance: 6000; Time: 19.37s; loss: 16266.7186; acc: 152363.0/188912.0=0.8065 Instance: 6500; Time: 19.58s; loss: 21224.6567; acc: 164716.0/204945.0=0.8037 Instance: 7000; Time: 20.10s; loss: 14005.5008; acc: 177883.0/220961.0=0.8050 Instance: 7500; Time: 19.66s; loss: 16057.6001; acc: 190685.0/236599.0=0.8059 Instance: 8000; Time: 20.49s; loss: 14424.4894; acc: 203964.0/252622.0=0.8074 Instance: 8320; Time: 13.11s; loss: 9627.8608; acc: 212431.0/262902.0=0.8080 Epoch: 1 training finished. Time: 325.03s, speed: 25.60st/s, total loss: 285254.94677734375 totalloss: 285254.94677734375 Right token = 45356 All token = 52923 acc = 0.8570186875271621 Dev: time: 16.07s, speed: 119.56st/s; acc: 0.8570, p: -1.0000, r: 0.0000, f: -1.0000 Right token = 45355 All token = 51533 acc = 0.8801156540469214 Test: time: 14.67s, speed: 103.71st/s; acc: 0.8801, p: -1.0000, r: 0.0000, f: -1.0000 Epoch: 2/100 Learning rate is set as: 0.013636363636363634 Shuffle: first input word list: [4629, 4641, 4642, 33, 41, 3264, 23, 50, 29, 23, 2373, 4643, 37, 39, 3269, 2392, 4644, 834, 109, 3449, 86, 4645, 168, 109, 2036, 3280, 23, 1215, 6, 70, 37, 6, 913, 39, 4639, 4640, 6, 3487, 50, 4646, 4647, 23, 21, 4484, 15, 4648, 10] Instance: 500; Time: 19.82s; loss: 20885.3103; acc: 12893.0/16169.0=0.7974 Instance: 1000; Time: 20.22s; loss: 18628.8053; acc: 26086.0/32552.0=0.8014 Instance: 1500; Time: 19.46s; loss: 15202.7120; acc: 38418.0/47930.0=0.8015 Instance: 2000; Time: 19.52s; loss: 18382.5779; acc: 50024.0/62925.0=0.7950 Instance: 2500; Time: 18.99s; loss: 15396.2651; acc: 62468.0/78217.0=0.7986 Instance: 3000; Time: 19.76s; loss: 19577.7411; acc: 74929.0/94184.0=0.7956 Instance: 3500; Time: 19.24s; loss: 18076.7267; acc: 87007.0/109485.0=0.7947 Instance: 4000; Time: 19.66s; loss: 16454.8464; acc: 99954.0/125322.0=0.7976 Instance: 4500; Time: 19.88s; loss: 16385.4592; acc: 112478.0/140793.0=0.7989 Instance: 5000; Time: 20.46s; loss: 14677.9603; acc: 125713.0/157081.0=0.8003 Instance: 5500; Time: 20.97s; loss: 21058.0332; acc: 139035.0/173825.0=0.7999 Instance: 6000; Time: 19.35s; loss: 11340.9529; acc: 152017.0/188874.0=0.8049 Instance: 6500; Time: 20.22s; loss: 16436.0164; acc: 164729.0/204922.0=0.8039 Instance: 7000; Time: 20.76s; loss: 17523.4183; acc: 177801.0/221258.0=0.8036 Instance: 7500; Time: 20.60s; loss: 14272.2194; acc: 191142.0/237650.0=0.8043 Instance: 8000; Time: 20.56s; loss: 12585.5363; acc: 203910.0/253135.0=0.8055 Instance: 8320; Time: 12.77s; loss: 8050.8893; acc: 211887.0/262902.0=0.8060 Epoch: 2 training finished. Time: 332.23s, speed: 25.04st/s, total loss: 274935.4700317383 totalloss: 274935.4700317383 Right token = 45356 All token = 52923 acc = 0.8570186875271621 Dev: time: 16.64s, speed: 115.44st/s; acc: 0.8570, p: -1.0000, r: 0.0000, f: -1.0000 Right token = 45355 All token = 51533 acc = 0.8801156540469214 Test: time: 15.29s, speed: 99.48st/s; acc: 0.8801, p: -1.0000, r: 0.0000, f: -1.0000

jiesutd commented 5 years ago

CNN相关的训练相对来说不稳定性会高一点。我在中文的任务上也观察到了类似的现象，你可以尝试一下：（按优先级排序）

可能和初始化的embedding有关，可是试试·norm_word_emb=True· 把embedding 先归一下。
设置更小的学习率, learning rate.
设置ave_batch_loss=True
尝试其它的optimizer, 比如adagrad, adam 等

lhjner commented 5 years ago

非常感谢您的耐心解答，我会按照您的建议重新实验。再次感谢！

jiesutd / NCRFpp

损失爆炸的问题 #107