Closed anoopkunchukuttan closed 2 years ago
Hi @anoopkunchukuttan, we used Google Translate as a sub-step(English-Hindi) in Sanskrit-English-Hindi Supervised MT, as we only had significant parallel data for Sanskrit-English. We didn't generate any dataset using Google Translate.
Thanks for the quick response @harpavatkeerti . So, just to clarify, all the data in the parallel_corpus directory is manual/mined from different sources?
Yes @anoopkunchukuttan all the data is mined from different sources mentioned in the report. Some of the scripts used for the same are available at https://github.com/priyanshu2103/Sanskrit-Hindi-Machine-Translation/blob/main/parallel-corpus/sanskrit-hindi/New_Testament_Sanskrit/test.py
Thanks!
I went through the report, and it suggested that some of the data is translated with Google Translated. Can you clarify which datasets in the repository are human translations & which are machine translated?