Closed threefoldo closed 6 years ago
Well, any data you can find should be easy enough to include, but I myself am not familiar with the landscape of Chinese datasets. We can keep this issue open for a while to see if anyone else watching might know of some good ones.
The follow is paper for DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications https://arxiv.org/abs/1711.05073
Thanks. I had already studied this data. Among 90k questions, most answers are long sentences, not short phrases extracted from input sentences. Maybe it could be preprocessed somehow before sending to decaNLP.
It's a little difficult to find Chinese dataset suitable for training decaNLP. Right now, all I have is: 1, douban movie review for sentiment analysis; 2, webqa from baidu. Is there any other data which can be used for training?