sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Alternative Input Methods #8

Closed gasyoun closed 7 years ago

gasyoun commented 10 years ago

Output is rather advanced and I have no issues as of it. But the input methods are rather limited. HK, SLP1 are good to have. Indians might even still use ITRANS. But we can't afford not to have IAST. And as you already know Indians really need Devanagari input (like it's done at http://www.sanskrit-lexicon.uni-koeln.de/monier1/webtc1/index.php but as you are aware does not supports copy-pasting, so you have to re-type every word, can not just check in it after copying from another source). What I would think about: 1) add IAST https://github.com/gasyoun/nagari/blob/master/SLP1_IAST.vbee 2) devanagari input (List display, http://www.sanskrit-lexicon.uni-koeln.de/monier1/webtc1/index.php) 3) detect encoding https://github.com/sanskrit/detect.py available as JS as well.

funderburkjim commented 10 years ago

After reviewing your SLP1_IAST document and the wikipedia article http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration , I finally understand that IAST is just another name for what is called 'Roman Unicode' in the sanskrit-lexicon transliterations. This is good, in that the mapping is already understood. That webtc1 display you mention does, therefore, support direct IAST input, as well as devanagari. Here are my rephrasing of the problems at sanskrit-lexicon regarding using IAST and Devanagari as inputs:

  1. webtc1 supports both, BUT you cannot paste, even if you have correctly chosen the Preferences. This is something that the underlying javascript purposely inhibits. The reason is that allowing ARBITRARY editing would be difficult to support; this is still the case. However, the JS could be modified to allow a Paste operation to replace the entire contents of the input box. This enhancement, though limited, might be worth doing. What do you think?
  2. The other displays (Basic, Advanced Search) do not permit Devanagari or IAST input at all. The user input forms are restricted to what the List display (webtc1) preferences calls 'phonetic' inputs. To use the webtc1 preferences system is more complicated than the simpler system used in Basic,Adv. Search. How to resolve?
  3. detect.py looks interesting. Perhaps the Author (Arun Prasad) could carry the idea a bit further by integrating it into a python version of MW. The 'web2py' download of MW might be a place to start, or I could make a python offline version that might be easier to experiment with; another thought is to make a Google App Engine version of the MW displays; this would be harder for me, since I haven't worked with App Engine; However, I'm pretty sure App Engine supports Python, and likely even the robust Django Python web Framework.
gasyoun commented 10 years ago

1) Yes, should be worth doing. 2) IAST input is not in the dropdown menu, so can I enter it? 3) Google App Engine version of the MW displays sound interesting. Where is 'web2py'?

funderburkjim commented 10 years ago

Re '2) IAST input is not in the dropdown menu, so can I enter it?' Answer: Not a present in Basic and Adv.Search displays. I'll look into extending the dropdown menu to include IAST. Re 3: 'Where is web2py?' Answer: item#8 of mw downloads http://www.sanskrit-lexicon.uni-koeln.de/download.html Do any of your programming contacts have experience with App Engine?

gasyoun commented 10 years ago

2) Right, IAST to come, great. 3) ftp://ftp.uni-koeln.de/institute/indologie/download/web2py_mw.zip I see it. Several Indian colleagues have experience with App Engine, we'll have to ask for advice and help from them. It might take a year or two.

gasyoun commented 10 years ago

After seven months The IAST is the only input method I lack badly lately. See https://github.com/wikimedia/jquery.ime/commit/c9449dd2c8341a88399c6b57a3e942fa7bf44c77 or as I have already proposed https://github.com/sanskrit/detect.py/commit/2ad80b4278833ec3ca3fcf2d92e7b339119b8cd4 - the solution is ready, all we need to do is use it. Even the algorithm is open. Can we possibly make it without the App Engine? What kind of question to we have about it - how to implement it on a 3rd party server?

funderburkjim commented 10 years ago

Let's see if we can get IAST working for you under the PW List display.

Then, we can consider making it more widely available in other displays.

I think the auto-detection problem is technically distinct, and more difficult. If there is a robust open-source solution, I'm willing to try it. Maybe the Python program you reference is a step in the right direction; I didn't see enough documentation to know what to make of it.

drdhaval2785 commented 9 years ago

@funderburkjim If you want an autodetection for IAST (along with autoconvert to SLP1) open source, the following PHP snippet from my code may be useful.

/* Code for converting from IAST to SLP1 */
// defining IAST letters.
$iast = array("a","ā","i","ī","u","ū","ṛ","ṝ","ḷ","ḹ","e","ai","o","au","ṃ","ḥ","kh","ch","ṭh","th","ph","gh","jh","ḍh","dh","bh","ṅ","ñ","ṇ","k","c","ṭ","t","p","g","j","ḍ","d","b","n","m","y","r","l","v","s","h","ś","ṣ",);
// defining SLP1 letters.
$slp = array("a","A","i","I","u","U","f","F","x","X","e","E", "o","O", "M","H","K", "C",  "W", "T", "P","G", "J",  "Q", "D","B", "N","Y","R","k","c","w","t","p","g","j","q","d","b","n","m","y","r","l","v","s","h","S","z",);
  if (preg_match('/[āĀīĪūŪṛṚṝṜḷḶḹḸṃṂḥḤṭṬḍḌṅṄñÑṇṆśŚṣṢV]/',$first) ) // if there is IAST letters in the input ($first), change them to SLP1
{
    $first = str_replace($iast,$slp,$first);
}

It works well for my program.

drdhaval2785 commented 9 years ago

The ordering of 'kh','ch' etc before 'k','c' etc in $iast had to be done to evade the greedy behaviour of str_replace function of PHP. It should work well in my opinion.

gasyoun commented 9 years ago

Still the #1 lacking feature for me.

drdhaval2785 commented 8 years ago

@funderburkjim IAST is that bad to code ? Time to integrate this enhancement is now.

gasyoun commented 8 years ago

I agree. Small, but valuable.

funderburkjim commented 8 years ago

In the 'sample' versions of displays such as list-0.2, the alternate inputs of either IAST or DEVANAGARI are implemented.

At some point, these 'Sample' displays should be mentioned on the main Cologne Sanskrit-Lexicon home page. In particular, that list-0.2 display is the one I use almost exclusively now, because it also has the auto-suggest feature, which is very useful when looking up individual words.

As the code is currently organized, the other displays (B,L,A) accessible via the home page would be difficult to adapt to this alternate, simply because each dictionary has a separate code base, with minor differences among the code base for each dictionary. There are advantages to these separate code bases, but the disadvantage is that implementing a 'global' change is awkward. Such a global change was done when making the titles uniform, and the templating technique used there probably could be adapted to making other global changes to B,L,A displays. But, as already mentioned, this is currently awkward, so I have been loathe to engage with it.

gasyoun commented 8 years ago

"templating technique used there probably could be adapted to making other global changes to B,L,A displays" - we are nearing 1998 :) So let there be IAST in every house. A http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/sample/list-0.2.html link would be enough for now.

gasyoun commented 7 years ago

@funderburkjim let's try to reproduce what http://spokensanskrit.de/index.php?beginning=0+&tinput=tata&trans=Translate&direction=SE has. Many of my pupils use the website just because it's insensitive mode to vowels and consonants. You do not need to know how exactly does the word should look like in advance.

kar

Did you mean one of the following words : (WORK IN PROGRESS REGARDING THIS FEATURE - SORRY FOR THE INCONVENIENCE!) kSar क्षर् karI करी kara कर kare करे kari करि kir किर् kr क्र् kur कुर् kArA कारा kAra कार

siva

siva ziva zivA

ganga

gAGga gaGgA gaGga

asana

asana asanA AsanA Asana azana Azana azanA

a A i I u U R RR lR lRR e ai o au M H k kh g gh G c ch j jh J T Th D Dh N t th d dh n p ph b bh m y r l v z S s h

@drdhaval2785 would you agree?

Proposal draft:
a=A 
i=I 
u=U 
r=R=RR 
l=lR=lRR
e ai o au
h=H
M=n=N=J=G
z=S=s
b=v
k=kh
g=gh
c=ch
j=jh
T=Th=t=th 
D=Dh=d=dh
p=ph 
b=bh
y r v
gasyoun commented 7 years ago

@juhnowski if I can ask, this would be the first task to update in the UI.

juhnowski commented 7 years ago

Hello! I push first implementation of @gasyoun "Proposal draft" https://github.com/juhnowski/word_variations Could you please review it. Next step - integration to existing UI. Could you please tell me in what place (page) I need do it?

What are you thinking about dividing one word to much more with the sandhi rules? For instance, implements Bühler's table of sandhi rules?

juhnowski commented 7 years ago

@funderburkjim Could you please help me with code integration?

gasyoun commented 7 years ago

@funderburkjim was https://github.com/sanskrit-lexicon/Cologne/issues/8#issuecomment-94280465 ever integrated? Please tell me. And @juhnowski code is ready, it's a replica of http://spokensanskrit.de/ search function. After that will need to find a way to display articles from several dictionaries at once. In one window. Is it possible, Jim?

funderburkjim commented 7 years ago

help me with code integration ?

I'm unsure what you are asking of me. Can you be more specific?

juhnowski commented 7 years ago

I create working prototype on github.io: https://juhnowski.github.io/word_variations/

funderburkjim commented 7 years ago

@juhnowski Just had a conversation with @gasyoun , so have a better idea of what this project is about.

Here's a suggestion: Make a 'live' version of your word variation project. You could serve it using GitHub Pages.

This version would be very simple. It would have input text field where user would type a spelling (like shiva, vishnu, etc.). When user clicks a submit button (or presses enter key, whichever), the program would display the list of word-variations that are generated for the input (shiva, siva, ...).

Seeing this simple display will help me understand what I need to do on the backend.

I'll work on a first approximation of this backend call; then, you can make a second version of your program which will make a call to the backend. (of course, I'll have to tell you the calling details before you can complete this second step.).

Does this sound like a good plan for making progress?

funderburkjim commented 7 years ago

@juhnowski Great - just noticed that you've already got step 1 done.

@gasyoun You should check this, to be sure it's doing what you expected.

gasyoun commented 7 years ago

@funderburkjim as expected.

funderburkjim commented 7 years ago

Let's assume that for the next step, you will send to my backend program (not yet written) a JSON object with two fields:

As a first step, my backend program will examine in the 'dict' dictionary at Cologne; for each word in the 'words' field, it will

@juhnowski How does this API specification sound?

gasyoun commented 7 years ago

@funderburkjim so for shiva we get:

shiva
shivA
shiba
shibA
shIva
shIvA
shIba
shIbA
sHiva
sHivA
sHiba
sHibA
sHIva
sHIvA
sHIba
sHIbA
ziva
zivA
ziba
zibA
zIva
zIvA
zIba
zIbA

That's quite a good start. The sH make no sense here, but will help in a word like duhkha which is quite common. What if śiva is Śiva or even çiva or Çiva in the text? Will we find it? In XML of PWK it's even C2iva, so until AS is not dead, our algorithm should be able to find even C2, I guess.

funderburkjim commented 7 years ago

Here's an implementation of api at cologne, along the lines described above.

It is put as a GitHub pages web-app, at http://funderburkjim.github.io/cologneapiwork/easyspell/01/

The html file illustrating the api , as well as a copy of the php script used at Cologne, are here

That html file uses a hard-coded list (the one Marcis shows above for 'shiva') as the input.

@juhnowski Is this what you need for now?

juhnowski commented 7 years ago

Good morning! @funderburkjim Is this what you need for now? - Yes, thank you.

gasyoun commented 7 years ago

@juhnowski

I would add a few elements from Velthius as well (only for input, not for variant generation):

aa=A
ii=I
uu=oo=U
.r=R
.rr=RR
.l=lR
.ll=lRR
.m=M
.h=H

From ITRANS (only for input, not for variant generation): v=w

gasyoun commented 7 years ago

https://github.com/shreevatsa/padyachandas and https://github.com/shreevatsa/sanskrit/blob/master/transliteration/detect.py contain interesting ideas. @juhnowski are you still around?

funderburkjim commented 7 years ago

@gasyoun Are we ready to start with a dev version of list-0.2.html that has a 'simple' input method based on the method developed thus far? Or do you want to do more preliminary work first?

gasyoun commented 7 years ago

Are we ready to start with a dev version of list-0.2.html that has a 'simple' input method based on the method developed thus far?

@juhnowski got lost and it was his turn.

Or do you want to do more preliminary work first?

I'm not sure what you mean. You showed what the server turns back, but we never heard back from Ilya. I reminded him a few times, but he was busy.

funderburkjim commented 7 years ago

By 'preliminary work': I meant, are you satisfied for now with the list Ilya generates for 'Vishnu', etc.

If so, I can try to integrate that technique into the display logic.

gasyoun commented 7 years ago

I meant, are you satisfied for now with the list Ilya generates for 'Vishnu', etc.

Yes, but I guess I would want some order of priority. Not all are equal. Like all those variants where v can be b - I would add a lower index and even if there is such a word, show it lower. Or no need for such sophistication?

funderburkjim commented 7 years ago

It sounds like we're ready to try the experiment with a dev version of list-0.2 display.

Adding to todo list.

Once we get a live experiment, we will know better how to add refinements such as you mentioned.

gasyoun commented 7 years ago

Once we get a live experiment, we will know better how to add refinements such as you mentioned.

Agree.