Dataset for training - Githubissues

Hello, reading the paper IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search I have read about the dataset used for the training that where

• Monolingual corpora of English, Hindi and Gujarati in their native scripts. • Word lists with corpus frequencies for English, Hindi, Ben- gali and Gujarati. • Word transliteration pairs for Hindi-English, Bengali-English and Gujarati-English.

plus additional crawled Romanized data.

Would it be possibile to provide these dataset in order to train the system from scratch?

Thank you.

libindic / indic-trans

Dataset for training #31