character and word embeddings

Robets2020 commented 6 years ago

In the readme, you mentioned that the pretrained character and word embeddings are the same with the embeddings in the baseline of RichWordSegmentor, i.e., character and word embeddings are gigaword_chn.all.a2b.uni.ite50.vec and ctb.50d.vec respectively. These seems not be mentioned in the paper. Are the experimental results of latticeLSTM in the paper obtained using these two embeddings？

In the paper, you mentioned that the word embeddings is pretrained using word2vec (Mikolov et al., 2013) over automatically segmented Chinese Giga-Word. Dose this word embedding is only used in those baseline methods?

jiesutd commented 6 years ago

Yes, the results in our paper come from the same embeddings as the baseline RichWordSegmentor.

I don't understand your second question. Which baseline methods do you mean? If you mean the baselines of our ACL18 paper, then Yes, those embeddings are used in all the word-based baselines.

Robets2020 commented 6 years ago

Thank you. I mean the ACL18 paper.

Robets2020 commented 6 years ago

In the ACL18 paper, which char emebdings are used for char baseline+bichar+softword on MSRA data?

jiesutd commented 6 years ago

All the char emb, bichar emb and word emb used in ACL 2018 are the same with RichWordSegmentor.

jiesutd / LatticeLSTM

character and word embeddings #5