Closed willanxywc closed 6 years ago
We used a rather old dump from November 2015, which we had lying around already pre-processed. I don't think there should be too much variation across dumps, but if you really want to replicate everything from scratch, I think I still have the tokenised sentences somewhere and could get them to you (contact me offline). Also, the gensim model we trained is available here.
Thanks~
Which version of wikipedia corpus do you use to pre-train the N2V? It's not clearly stated in the paper but there are so many versions or copies wikipedia available.