What is the source of the dataset?

Hi actually most of the dataset is from

http://opus.nlpl.eu/ and other some other opensource place, You can check how dataset is created from opus, then we preprocessed as mentioned in the paper. Thanks and Regards

On Tue, Dec 29, 2020, 11:15 PM Vedant Raval notifications@github.com wrote:

Hello. My name is Vedant and I am working on a project related to Indic MT at IIT Delhi. The stats about the dataset as mentioned in the readme are very impressive. We would like to make use for your training data but before that it would be much helpful to us if you can provide more information about the dataset. Like if there is any published research paper associated with your dataset, how did you get such a large dataset, was any human monitoring involved while curating the dataset etc.

As all this information is not present in the readme, it would be much helpful to us if you can help fill this gap :))

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/himanshudce/Indian-Language-Dataset/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAA5E7ETS7QITSTEWQJI3DSXIISJANCNFSM4VNOHYMQ .

himanshudce / Indian-Language-Dataset

What is the source of the dataset? #3