Closed ljch2018 closed 6 years ago
python main.py --status train \ --train ./Weibo/weiboNER_2nd_conll.train.bio \ --dev ./Weibo/weiboNER_2nd_conll.dev.bio \ --test ./Weibo/weiboNER_2nd_conll.test.bio \ --savemodel ./Weibo/model \
Train file: ./Weibo/weiboNER_2nd_conll.train.bio Dev file: ./Weibo/weiboNER_2nd_conll.dev.bio Test file: ./Weibo/weiboNER_2nd_conll.test.bio Raw file: None Char emb: data/gigaword_chn.all.a2b.uni.ite50.vec Bichar emb: None Gaz file: data/ctb.50d.vec Model saved to: ./Weibo/model Load gaz file: data/ctb.50d.vec total size: 704368 gaz alphabet size: 10798 gaz alphabet size: 12235 gaz alphabet size: 13671 build word pretrain emb... Embedding: pretrain word:11327, prefect match:3281, case_match:0, oov:75, oov%:0.0223413762288 build biword pretrain emb... Embedding: pretrain word:0, prefect match:0, case_match:0, oov:42646, oov%:0.999976551692 build gaz pretrain emb... Embedding: pretrain word:704368, prefect match:13669, case_match:0, oov:1, oov%:7.31475385853e-05 Training model... DATA SUMMARY START: Tag scheme: BIO MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Use bigram: False Word alphabet size: 3357 Biword alphabet size: 42647 Char alphabet size: 3357 Gaz alphabet size: 13671 Label alphabet size: 18 Word embedding size: 50 Biword embedding size: 50 Char embedding size: 30 Gaz embedding size: 50 Norm word emb: True Norm biword emb: True Norm gaz emb: False Norm gaz dropout: 0.5 Train instance number: 1350 Dev instance number: 270 Test instance number: 270 Raw instance number: 0 Hyperpara iteration: 100 Hyperpara batch size: 1 Hyperpara lr: 0.015 Hyperpara lr_decay: 0.05 Hyperpara HP_clip: 5.0 Hyperpara momentum: 0 Hyperpara hidden_dim: 200 Hyperpara dropout: 0.5 Hyperpara lstm_layer: 1 Hyperpara bilstm: True Hyperpara GPU: True Hyperpara use_gaz: True Hyperpara fix gaz emb: False Hyperpara use_char: False DATA SUMMARY END. Data setting saved to file: ./Weibo/model.dset
@jiesutd 请问你一下,问题可能出在那里啊?多谢!
你可以先转成bioes格式,我论文里的结果都是bioes格式输入的。
@jiesutd 感谢指教,我先尝试一下。
这个数据集里面,每一个字符后面的是位置编码吗?(比如:赵0 B-PER.NAM),这个位置编码应该需要去掉吧,不然这个和我们预训练的字符向量不能匹配??
@HaimianYu 是的,需要把位置信息抹除掉。
@jiesutd 请问你一下,问题可能出在那里啊?多谢!