Open countback opened 5 years ago
@countback Sorry I'm afraid that I cannot release the People's Daily dataset due to license issues. But I suggest you to contact the Institute of Computational Linguistics at Peking University for that kind of data.
If you just want to reproduce the reported results in the paper, you can use the data segmented by the baseline segmenter, which is provided in the repo.
Sorry to be a bother. I read your NAACL 2019 paper and I am very interested in these two papers. The improvements on cross domain chinese cws makes me feel excited. Presently, I am doing some related issues, and I wonder if you can release the People Daily 2000 Jan dataset used to pre-train the baseline segmenter which would give me a great of help in reproducing the results reported in paper and comprehending your algorithm. I will be appreciated for your reply, thank you very much.