Open bwasty opened 2 years ago
I found several issues with transliterating diacritics from Devanagari (Hindi):
- कॅ -> kaॅ (iso/iast, fine in itrans: ka.c)
What should this be in ISO?
- सड़क -> saḍa़ka (iso/iast)
What should this be in ISO?
- फ़ोन -> pha़ōna (iso/iast)
f is expected I suppose. Contribute a fix?
- ज़्यादा ->ja़yAdA (itrans; other way correct: zyaada)
Contribute a fix?
By the way, great project, wrote 2 small tools with it already:
- कॅ -> kaॅ (iso/iast, fine in itrans: ka.c)
What should this be in ISO?
m̐k
. Same in IAST (according to this. Here ˜
is shown, though the discussion page suggests m̐
is correct)
- सड़क -> saḍa़ka (iso/iast)
What should this be in ISO?
saṛaka
in ISO. For IAST it's not specified - so remove the dangling dot maybe? or use the same? For ITRANS it should be .Da
or .Ra
.
Related: ढ़
should become ṛha
in ISO and .Dha/Rha
in ITRANS.
- फ़ोन -> pha़ōna (iso/iast)
f is expected I suppose. Contribute a fix?
Yes, for ISO and ITRANS. For IAST it's not specified - maybe do the same anyway?
- ज़्यादा ->ja़yAdA (itrans; other way correct: zyaada)
Contribute a fix?
I'm not sure I understand Devanagari well enough yet (literally started learning a week ago), but I might try :)
Ah, right, damn. Wikipedia shows ê
for ॲ
and ऍ
.
The unicode block shows a few more characters with a 'candra', but I guess they have no transliteration?
Basically, problem is that transliterateBrahmic assumes that it's ok to transliterate character by character. It does not consider max token length (unlike https://github.com/indic-transliteration/indic_transliteration_py/blob/99fe6b2fd5b220794d1709e3297c919d58c4cfcc/indic_transliteration/sanscript/brahmic_mapper.py ). Porting the python code might work.
I found several issues with transliterating diacritics from Devanagari (Hindi):
ज़्यादा ->ja़yAdA (itrans; other way correct: zyaada)
By the way, great project, wrote 2 small tools with it already: