himanshudce / Indian-Language-Dataset

Clean parallel corpus for five low resourced Indian Languages
7 stars 1 forks source link

What is the source of the dataset? #3

Open Vedant2311 opened 3 years ago

Vedant2311 commented 3 years ago

Hello. My name is Vedant and I am working on a project related to Indic MT at IIT Delhi. The stats about the dataset as mentioned in the readme are very impressive. We would like to make use for your training data but before that it would be much helpful to us if you can provide more information about the dataset. Like if there is any published research paper associated with your dataset, how did you get such a large dataset, was any human monitoring involved while curating the dataset etc.

As all this information is not present in the readme, it would be much helpful to us if you can help fill this gap :))

himanshudce commented 3 years ago

Hi actually most of the dataset is from

On Tue, Dec 29, 2020, 11:15 PM Vedant Raval notifications@github.com wrote:

Hello. My name is Vedant and I am working on a project related to Indic MT at IIT Delhi. The stats about the dataset as mentioned in the readme are very impressive. We would like to make use for your training data but before that it would be much helpful to us if you can provide more information about the dataset. Like if there is any published research paper associated with your dataset, how did you get such a large dataset, was any human monitoring involved while curating the dataset etc.

As all this information is not present in the readme, it would be much helpful to us if you can help fill this gap :))

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/himanshudce/Indian-Language-Dataset/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAA5E7ETS7QITSTEWQJI3DSXIISJANCNFSM4VNOHYMQ .