Closed Robets2020 closed 6 years ago
Yes, the results in our paper come from the same embeddings as the baseline RichWordSegmentor.
I don't understand your second question. Which baseline methods do you mean? If you mean the baselines of our ACL18 paper, then Yes, those embeddings are used in all the word-based baselines.
Thank you. I mean the ACL18 paper.
In the ACL18 paper, which char emebdings are used for char baseline+bichar+softword on MSRA data?
All the char emb, bichar emb and word emb used in ACL 2018 are the same with RichWordSegmentor.
In the readme, you mentioned that the pretrained character and word embeddings are the same with the embeddings in the baseline of RichWordSegmentor, i.e., character and word embeddings are gigaword_chn.all.a2b.uni.ite50.vec and ctb.50d.vec respectively. These seems not be mentioned in the paper. Are the experimental results of latticeLSTM in the paper obtained using these two embeddings?
In the paper, you mentioned that the word embeddings is pretrained using word2vec (Mikolov et al., 2013) over automatically segmented Chinese Giga-Word. Dose this word embedding is only used in those baseline methods?