weibo数据集的F1值低于论文中得到的值

yqqqqqq commented 5 years ago

weibo log: CuDNN: True GPU available: True Status: train Seg: True Train file: WeiboNER/train.ne.bmes Dev file: WeiboNER/dev.ne.bmes Test file: WeiboNER/test.ne.bmes Raw file: None Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec Bichar emb: None Gaz file: data/ctb.50d.vec Model saved to: WeiboNER/saved_model.all Load gaz file: data/ctb.50d.vec total size: 704368 gaz alphabet size: 10798 gaz alphabet size: 12235 gaz alphabet size: 13671 build word pretrain emb... Embedding: pretrain word:11327, perfect match:3281, case_match:0, oov:75, oov%:0.0223413762288 build biword pretrain emb... Embedding: pretrain word:0, perfect match:0, case_match:0, oov:42646, oov%:0.999976551692 build gaz pretrain emb... Embedding: pretrain word:704368, perfect match:13669, case_match:0, oov:1, oov%:7.31475385853e-05 Training model... DATA SUMMARY START: Tag scheme: BMES MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Use bigram: False Word alphabet size: 3357 Biword alphabet size: 42647 Char alphabet size: 3357 Gaz alphabet size: 13671 Label alphabet size: 16 Word embedding size: 50 Biword embedding size: 50 Char embedding size: 30 Gaz embedding size: 50 Norm word emb: True Norm biword emb: True Norm gaz emb: False Norm gaz dropout: 0.5 Train instance number: 1350 Dev instance number: 270 Test instance number: 270 Raw instance number: 0 Hyperpara iteration: 100 Hyperpara batch size: 1 Hyperpara lr: 0.015 Hyperpara lr_decay: 0.05 Hyperpara HP_clip: 5.0 Hyperpara momentum: 0 Hyperpara hidden_dim: 200 Hyperpara dropout: 0.5 Hyperpara lstm_layer: 1 Hyperpara bilstm: True Hyperpara GPU: True Hyperpara use_gaz: True Hyperpara fix gaz emb: False Hyperpara use_char: False DATA SUMMARY END. Data setting saved to file: WeiboNER/saved_model.all.dset build batched lstmcrf... build batched bilstm... build LatticeLSTM... forward , Fix emb: False gaz drop: 0.5 load pretrain word emb... (13671, 50) ... ... ... Epoch: 98 training finished. Time: 379.70s, speed: 3.56st/s, total loss: 638.227500916 gold_num = 169 pred_num = 138 right_num = 83 Dev: time: 23.04s, speed: 11.73st/s; acc: 0.9752, p: 0.6014, r: 0.4911, f: 0.5407 gold_num = 216 pred_num = 140 right_num = 94 Test: time: 23.75s, speed: 11.38st/s; acc: 0.9706, p: 0.6714, r: 0.4352, f: 0.5281 Epoch: 99/100 Learning rate is setted as: 9.34820403211e-05 Instance: 500; Time: 138.24s; loss: 222.6014; acc: 26713.0/26857.0=0.9946 Instance: 1000; Time: 143.49s; loss: 236.1890; acc: 54543.0/54865.0=0.9941 Instance: 1350; Time: 98.02s; loss: 162.5528; acc: 73351.0/73778.0=0.9942 Epoch: 99 training finished. Time: 379.75s, speed: 3.55st/s, total loss: 621.343292236 gold_num = 169 pred_num = 139 right_num = 83 Dev: time: 23.04s, speed: 11.73st/s; acc: 0.9748, p: 0.5971, r: 0.4911, f: 0.5390 gold_num = 216 pred_num = 142 right_num = 94 Test: time: 23.48s, speed: 11.51st/s; acc: 0.9705, p: 0.6620, r: 0.4352, f: 0.5251 问题出在哪？在overall上的值也没有达到文中的58.79%，只有56点几的

jiesutd commented 5 years ago

Your log file is not complete. The best iteration is not the last epoch but the epoch with the best development performance.

yqqqqqq commented 5 years ago

代码中不是把产生最好的f1值的训练的模型保存下来了吗？比如最后一个保存的是saved_model.43.model，the 43 epoch 是 the best development performance?

jiesutd commented 5 years ago

Yes, so you need to see the epoch of best development performance rather the last epoch.

yqqqqqq commented 5 years ago

I know, I saw the epoch of best development performance is the 34 epoch, and I run sh.run_mainweibo.sh (status is decode), but I only got the F1 value of 51.77. The result is 53.04 in your paper. The results are as follows: GPU available: True Status: decode Seg: True Train file: data/conll03/train.bmes Dev file: data/conll03/dev.bmes Test file: data/conll03/test.bmes Raw file: ./WeiboNER/test.ne.bmes Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec Bichar emb: None Gaz file: data/ctb.50d.vec Data setting loaded from file: ./WeiboNER/saved_model.ne.dset DATA SUMMARY START: Tag scheme: BMES MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Use bigram: False Word alphabet size: 3357 Biword alphabet size: 42647 Char alphabet size: 3357 Gaz alphabet size: 13671 Label alphabet size: 16 Word embedding size: 50 Biword embedding size: 50 Char embedding size: 30 Gaz embedding size: 50 Norm word emb: True Norm biword emb: True Norm gaz emb: False Norm gaz dropout: 0.5 Train instance number: 0 Dev instance number: 0 Test instance number: 0 Raw instance number: 0 Hyperpara iteration: 100 Hyperpara batch size: 1 Hyperpara lr: 0.015 Hyperpara lr_decay: 0.05 Hyperpara HP_clip: 5.0 Hyperpara momentum: 0 Hyperpara hidden_dim: 200 Hyperpara dropout: 0.5 Hyperpara lstm_layer: 1 Hyperpara bilstm: True Hyperpara GPU: True Hyperpara use_gaz: True Hyperpara fix gaz emb: False Hyperpara use_char: False DATA SUMMARY END. Load Model from file: ./WeiboNER/saved_model.ne.34.model build batched lstmcrf... build batched bilstm... build LatticeLSTM... forward , Fix emb: False gaz drop: 0.5 load pretrain word emb... (13671, 50) build LatticeLSTM... backward , Fix emb: False gaz drop: 0.5 load pretrain word emb... (13671, 50) build batched crf... Decode raw data ... gold_num = 216 pred_num = 151 right_num = 95 raw: time:23.52s, speed:11.49st/s; acc: 0.9691, p: 0.6291, r: 0.4398, f: 0.5177 Predict raw result has been written into file. ./WeiboNER/raw_wb.ne.out

jiesutd commented 5 years ago

Then you can try to set 'data. norm_word_emb' as False or use a different random seed. The weibo dataset is too small which leads the result unstable.

yqqqqqq commented 5 years ago

OK, thank you so much!

yqqqqqq commented 5 years ago

Sorry to interrupt you again.Can you teach me how to modify the code to reproduce the result of char baseline +bichar +softword model？Thanks！

Study-ym commented 4 years ago

58.79是最优结果吗，为什么我会跑出59.21的数据

jiesutd / LatticeLSTM

weibo数据集的F1值低于论文中得到的值 #57