sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Function to Type Ryssian Cyrillic Letters #404

Open gasyoun opened 1 year ago

gasyoun commented 1 year ago

As mentioned I would want to be able to type Russian letters in https://sanskrit-lexicon.uni-koeln.de/simple/ so they would be converted to latin. We'v developed a VBEE converter we use for EmEditor. The order of the replacements matters. Let's take an example:

document.selection.Replace "jña","джня",eeFindNext Or eeFindReplaceEscSeq Or eeReplaceAll

If we search for "джня" it should return "jña" which in return in simple might give as "jñā" as well.

If we would start with:

document.selection.Replace "j","дж",eeFindNext Or eeFindReplaceEscSeq Or eeReplaceAll

We would miss this case:

document.selection.Replace "jj","ддж",eeFindNext Or eeFindReplaceEscSeq Or eeReplaceAll

Only at word endings:

document.selection.Replace "tṛ ","три ",eeReplaceAll Or eeFindReplaceRegExp,0

Only at word beginnings:

document.selection.Replace "/e","э",eeFindNext Or eeFindReplaceEscSeq Or eeReplaceAll

What format should I convert it to @funderburkjim?

_IAST-Rus_Converter_1.2.txt

funderburkjim commented 1 year ago

Let's start initially with a file whose lines are like:

russian iast
Example
Make the russian as short as possible in this.  E.g.  
и i
Since Russian has capital letters, a long iast I could be represented as
И ī
Note it is ok to have more than one Russian letter to correspond 1 iast letter.
For this first file, let's stick to how to represent just 1 iast letter.

Then, we'll have to understand other subtle points that this simple 'russian iast' mapping does not represent.

OrphicEgg commented 1 year ago

a а ?backwards? а ā и i и ī у u у ū ри ṛ ри ṝ л ḷ л ḹ е е ай ai o o ау au х ḥ н ṃ к k кх kh г g гх gh н ṅ ч c чх ch дж j джх jh нь ñ т ṭ тх ṭh д ḍ дх ḍh н ṇ т t тх th д d дх dh н n п p пх ph б b бх bh м m й y р r л l в v ш ś ш ṣ с s х h

funderburkjim commented 1 year ago

trial 1

image

Notes:

  1. The cyrillic input functionality only applies with simple-search (version 1.1) and
    • input = simple, input_simple = default
  2. This trial uses transcoding file cyrillic_slp1.xml.
    • Closely based on @OrphicEgg listing above.
      • Please check first line in listing above, and correct
    • Note some differences in the comments in cyrillic_slp1.xml
funderburkjim commented 1 year ago

Note cyrillic reference

gasyoun commented 1 year ago

Thanks so much, it works!

a а ?backwards?

What do you mean by backwards? Cyrillic looks similar to Latin, but they have different Unicode numbers.

Let's move one.

джньяна will not be found (wanted jñāna), because now we have to treat clusters in addition to converting just simple letters.

So we need джня jña above other rules.

funderburkjim commented 1 year ago

change 1

This change slightly different from the previous comment.

<e> <s>INIT</s> <in>джня</in> <out>jY</out> <next>INIT</next></e>
<!-- ignore these cyrillic characters -->
<e> <s>INIT</s> <in>ь</in> <out></out> <next>INIT</next></e>  

Thus джньяна does not work, but джнаяна does work, as does джнана.

Another source I used is https://www.lexilogos.com/keyboard/russian_conversion.htm.

If there are many other rules required, it might be efficient for you to experiment with a local installation which includes csl-apidev. You could adjust rules in cyrillic_slp1.xml until satisfied.