virtualvinodh / aksharamukha

Aksharamukha
161 stars 41 forks source link

Add option to toggle variants for conversion into ITRANS #157

Closed the-solipsist closed 2 years ago

the-solipsist commented 3 years ago

As you know, ITRANS supports lower-case long vowels (e.g., aa for A), and symbol-less variants (e.g., RRi for R^i). It would be useful to add at least these as a toggle. I'm sure I'm not alone in thinking that shuklaambaradharam is easier to read than shuklAmbaradharam and RRiShi is easier to read thanR^iShi (though the ri'shi of "Roman (Readable)" and the ru'shi of your "Aksharaa" are far more readable!)

Here's a list of all the variants provided in ITRANS's default conversion table:

aa = A
ii = I = ee
uu = U
RRi = R^i
LLi = L^i
.n = M = .m
~N = ṅ
chh = Ch
ld = L
x = kSh
GY = j~n = dny
J = z
OM = AUM

Currently, Aksharamukha uses the following variants for outputs:

A , I , U , R^i , L^i , M , Ch , L , ~N , Ch , L , kSh , j~n , z

Aksharamukha doesn't recognize the following variants as inputs:

ee , ld , dny , OM , AUM

Sanscript, the popular conversion software, adds a few more variants, which are especially useful for conversion from ITRANS-like system.

oo = U (in addition to uu)
N^ = ~N
c = ch
C = Ch (in addition to chh)
JN = ~n
w = v
kS = kSh (in addition to x)
~ = .a
. = |
.. = ||

Some of these, especially oo, w, and JN, seem very useful.

virtualvinodh commented 2 years ago

Except for /ld/ I have implemented the others. This should be visible when I push my changes in a couple of days.

Also, currently, I have /JN/ as ñ. I am not sure where I got it though. Are you sure it maps to jñ?

V

the-solipsist commented 2 years ago

Even looking at the current master, I find this: https://github.com/indic-transliteration/common_maps/blob/59dcb7746513aa806a75bb4ec5ce807611ecb028/roman/itrans.toml#L106-L131

"~n" = [ "JN",]

Maybe we should check with @vvasuki ?

vvasuki commented 2 years ago

when I want to type , I find it convenient to type jJN. JN mapping to ñ is convenient for typing without straining the left hand to produce a ~.

nsesha92 commented 2 years ago

I am also for symbol-less, non-IPA type ( ie without symbol on top like ñ) My choice, as I had indicated long back:

aa = A ii = I uu = U ee = E ( e = e to be in sync) oo = O

~N = ng ~n = nj = ny gn = gy = j~n

But first these changes have to be implemented by ITRANS, post which aksharamukha can follow it.

However the biggest discrepancy in ITRANS is the coding of e & O:

e = ए (big, deerga, for any devanagari derived script) AND e = எ ( small, hrasva for any south indian script)

However ए IS NOT equal to எ by pronunciation, as well as by code point alignment.

Why this #roman AND #roman-south divide?

Again, this is to be logged for ITRANS to correct this discrepancy.

Because of this grave discrepancy, the Sanskrit version of naalaayira divya prabandam in Vedics.org is totally wrong, wrt to e/E & o/O.

the-solipsist commented 2 years ago

A quick note of clarification:

  1. ITRANS has oo as a "code name", but not as an input, which would be O. (So, the inputs ^o, o, and O, correspond to the "code names" short-o, o, and oo, which in turn correspond to "roman" o, ō, and undefined, and "roman-south" undefined, o, and ō.)
  2. Sanscript uses oo as a valid input by itself (rather than counting that as two os), and it is the equivalent to ITRANS uu/U.

Given the seeming popularity of Sanscript, should that be supported as well?

As an aside: Since ITRANS has allowed for customizable mappings since around 2016, I don't think Avinash Chopde will be in favour of making changes now. At any rate, I think that's best left for discussion on the ITRANS issues page.

nsesha92 commented 2 years ago

ITRANS has been updated, my suggestion is implemented.

https://github.com/avinash311/itrans/issues/4

I also checked scanscript, its ITRANS mapping is old and not updated. Also, there is no option for short e & o for Tamizh, Telegu, Kannada and Malayalam.

vvasuki commented 2 years ago

ITRANS has been updated, my suggestion is implemented.

avinash311/itrans#4

I also checked scanscript, its ITRANS mapping is old and not updated. Also, there is no option for short e & o for Tamizh, Telegu, Kannada and Malayalam.

That last sentence is false - sanscript has a separate ITRANS Dravidian map.

nsesha92 commented 2 years ago

https://www.learnsanskrit.org/tools/sanscript/

in this main page, it does not show:

nsesha92 commented 2 years ago

sanscript-samll_e_o

virtualvinodh commented 2 years ago

This has been fixed and will be pushed in the next update. I totally forgot to push this in an earlier forgot!