jiesutd / LatticeLSTM

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.
1.8k stars 453 forks source link

biword embedding? #36

Closed liuwei1206 closed 6 years ago

liuwei1206 commented 6 years ago

Hi, can you share me with pretrained biword-embedding?

jiesutd commented 6 years ago

gigaword_chn.all.a2b.bi.ite50.vec in this folder: https://pan.baidu.com/s/1pLO6T9D#list/path=%2F

liuwei1206 commented 6 years ago

Thank you very much, I believe that people who are helpful are always lucky!!!

liuwei1206 commented 6 years ago

Hi,how do you define the highest score of a test set? it's just the highest score on test set, or the the score of test set corresponds to the highest score on dev set(mean same epoch)?

jiesutd commented 6 years ago

@liuwei1206 definitely the dev set.

gloria0108 commented 4 years ago

gigaword_chn.all.a2b.bi.ite50.vec in this folder: https://pan.baidu.com/s/1pLO6T9D#list/path=%2F

作者您好! 请问在您论文的char baseline+bichar中您使用的bigram(bichar) embedding是预训练的gigaword_chn.all.a2b.bi.ite50.vec吗?那么在论文的word baseline中请问您使用的bigram(biword) embedding是预训练的还是随机初始化的?

jiesutd commented 4 years ago

@gloria0108 Pretrained

gloria0108 commented 4 years ago

@gloria0108 Pretrained

请问在word baseline中使用的pretrain bigram embedding也是gigaword_chn.all.a2b.bi.ite50.vec吗?还是用了其他的pretrain embedding能麻烦您分享一下吗?gigaword_chn.all.a2b.bi.ite50.vec中都是bichar(当前字和后一个字的concatenation,两个字),但是word baseline中使用的biword(当前词和下一个词的concatenation)很多情况下都是多于两个字的),如果用上面的bichar embedding应该会有很多biword不在里面吧?感谢您的回复!

jiesutd commented 4 years ago

@gloria0108 等等,我的word baseline 没有用biword 啊,依然是bichar, 用的事一样的pretrained bichar embeddings. word level没必要用word bigram.

gloria0108 commented 4 years ago

@gloria0108 等等,我的word baseline 没有用biword 啊,依然是bichar, 用的事一样的pretrained bichar embeddings. word level没必要用word bigram. 好的,感谢回复。 因为我看到function.py中的第152行:biword = word + in_lines[idx+1].strip().split()[0].decode('utf-8') https://github.com/jiesutd/LatticeLSTM/blob/24d17f4270f11d2f75046789d8b67eaa2b907dce/utils/functions.py#L152 感觉在wordbaseline中,biword似乎是当前词和后一个词的concatenation。还是我理解错了?

jiesutd commented 4 years ago

@gloria0108 不好意思,这里的input 实际上是char sequence 所以这一行实际的意思是char bigram. 这份code 是从我以前code 改过来的,所以很多变量名字没有修改过来,所以有时看着比较疑惑。

gloria0108 commented 4 years ago

@gloria0108 不好意思,这里的input 实际上是char sequence 所以这一行实际的意思是char bigram. 这份code 是从我以前code 改过来的,所以很多变量名字没有修改过来,所以有时看着比较疑惑。 明白了,谢谢!