biword embedding? - Githubissues

jiesutd / LatticeLSTM

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.

1.8k stars 453 forks source link

biword embedding? #36

Closed liuwei1206 closed 6 years ago

liuwei1206 commented 6 years ago

Hi, can you share me with pretrained biword-embedding?

jiesutd commented 6 years ago

gigaword_chn.all.a2b.bi.ite50.vec in this folder: https://pan.baidu.com/s/1pLO6T9D#list/path=%2F

liuwei1206 commented 6 years ago

Thank you very much, I believe that people who are helpful are always lucky!!!

liuwei1206 commented 6 years ago

Hi，how do you define the highest score of a test set? it's just the highest score on test set, or the the score of test set corresponds to the highest score on dev set(mean same epoch)?

jiesutd commented 6 years ago

@liuwei1206 definitely the dev set.

gloria0108 commented 4 years ago

gigaword_chn.all.a2b.bi.ite50.vec in this folder: https://pan.baidu.com/s/1pLO6T9D#list/path=%2F

作者您好！请问在您论文的char baseline+bichar中您使用的bigram（bichar） embedding是预训练的gigaword_chn.all.a2b.bi.ite50.vec吗？那么在论文的word baseline中请问您使用的bigram（biword） embedding是预训练的还是随机初始化的？

jiesutd commented 4 years ago

@gloria0108 Pretrained

gloria0108 commented 4 years ago

@gloria0108 Pretrained

请问在word baseline中使用的pretrain bigram embedding也是gigaword_chn.all.a2b.bi.ite50.vec吗？还是用了其他的pretrain embedding能麻烦您分享一下吗？gigaword_chn.all.a2b.bi.ite50.vec中都是bichar（当前字和后一个字的concatenation，两个字），但是word baseline中使用的biword（当前词和下一个词的concatenation）很多情况下都是多于两个字的），如果用上面的bichar embedding应该会有很多biword不在里面吧？感谢您的回复！

jiesutd commented 4 years ago

@gloria0108 等等，我的word baseline 没有用biword 啊，依然是bichar，用的事一样的pretrained bichar embeddings. word level没必要用word bigram.

gloria0108 commented 4 years ago

@gloria0108 等等，我的word baseline 没有用biword 啊，依然是bichar，用的事一样的pretrained bichar embeddings. word level没必要用word bigram. 好的，感谢回复。因为我看到function.py中的第152行：biword = word + in_lines[idx+1].strip().split()[0].decode('utf-8') https://github.com/jiesutd/LatticeLSTM/blob/24d17f4270f11d2f75046789d8b67eaa2b907dce/utils/functions.py#L152 感觉在wordbaseline中，biword似乎是当前词和后一个词的concatenation。还是我理解错了？

jiesutd commented 4 years ago

@gloria0108 不好意思，这里的input 实际上是char sequence 所以这一行实际的意思是char bigram. 这份code 是从我以前code 改过来的，所以很多变量名字没有修改过来，所以有时看着比较疑惑。

gloria0108 commented 4 years ago

@gloria0108 不好意思，这里的input 实际上是char sequence 所以这一行实际的意思是char bigram. 这份code 是从我以前code 改过来的，所以很多变量名字没有修改过来，所以有时看着比较疑惑。明白了，谢谢！