Closed gasyoun closed 7 years ago
After reviewing your SLP1_IAST document and the wikipedia article http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration , I finally understand that IAST is just another name for what is called 'Roman Unicode' in the sanskrit-lexicon transliterations. This is good, in that the mapping is already understood. That webtc1 display you mention does, therefore, support direct IAST input, as well as devanagari. Here are my rephrasing of the problems at sanskrit-lexicon regarding using IAST and Devanagari as inputs:
1) Yes, should be worth doing. 2) IAST input is not in the dropdown menu, so can I enter it? 3) Google App Engine version of the MW displays sound interesting. Where is 'web2py'?
Re '2) IAST input is not in the dropdown menu, so can I enter it?' Answer: Not a present in Basic and Adv.Search displays. I'll look into extending the dropdown menu to include IAST. Re 3: 'Where is web2py?' Answer: item#8 of mw downloads http://www.sanskrit-lexicon.uni-koeln.de/download.html Do any of your programming contacts have experience with App Engine?
2) Right, IAST to come, great. 3) ftp://ftp.uni-koeln.de/institute/indologie/download/web2py_mw.zip I see it. Several Indian colleagues have experience with App Engine, we'll have to ask for advice and help from them. It might take a year or two.
After seven months The IAST is the only input method I lack badly lately. See https://github.com/wikimedia/jquery.ime/commit/c9449dd2c8341a88399c6b57a3e942fa7bf44c77 or as I have already proposed https://github.com/sanskrit/detect.py/commit/2ad80b4278833ec3ca3fcf2d92e7b339119b8cd4 - the solution is ready, all we need to do is use it. Even the algorithm is open. Can we possibly make it without the App Engine? What kind of question to we have about it - how to implement it on a 3rd party server?
Let's see if we can get IAST working for you under the PW List display.
Then, we can consider making it more widely available in other displays.
I think the auto-detection problem is technically distinct, and more difficult. If there is a robust open-source solution, I'm willing to try it. Maybe the Python program you reference is a step in the right direction; I didn't see enough documentation to know what to make of it.
@funderburkjim If you want an autodetection for IAST (along with autoconvert to SLP1) open source, the following PHP snippet from my code may be useful.
/* Code for converting from IAST to SLP1 */
// defining IAST letters.
$iast = array("a","ā","i","ī","u","ū","ṛ","ṝ","ḷ","ḹ","e","ai","o","au","ṃ","ḥ","kh","ch","ṭh","th","ph","gh","jh","ḍh","dh","bh","ṅ","ñ","ṇ","k","c","ṭ","t","p","g","j","ḍ","d","b","n","m","y","r","l","v","s","h","ś","ṣ",);
// defining SLP1 letters.
$slp = array("a","A","i","I","u","U","f","F","x","X","e","E", "o","O", "M","H","K", "C", "W", "T", "P","G", "J", "Q", "D","B", "N","Y","R","k","c","w","t","p","g","j","q","d","b","n","m","y","r","l","v","s","h","S","z",);
if (preg_match('/[āĀīĪūŪṛṚṝṜḷḶḹḸṃṂḥḤṭṬḍḌṅṄñÑṇṆśŚṣṢV]/',$first) ) // if there is IAST letters in the input ($first), change them to SLP1
{
$first = str_replace($iast,$slp,$first);
}
It works well for my program.
The ordering of 'kh','ch' etc before 'k','c' etc in $iast had to be done to evade the greedy behaviour of str_replace function of PHP. It should work well in my opinion.
Still the #1 lacking feature for me.
@funderburkjim IAST is that bad to code ? Time to integrate this enhancement is now.
I agree. Small, but valuable.
In the 'sample' versions of displays such as list-0.2, the alternate inputs of either IAST or DEVANAGARI are implemented.
At some point, these 'Sample' displays should be mentioned on the main Cologne Sanskrit-Lexicon home page. In particular, that list-0.2 display is the one I use almost exclusively now, because it also has the auto-suggest feature, which is very useful when looking up individual words.
As the code is currently organized, the other displays (B,L,A) accessible via the home page would be difficult to adapt to this alternate, simply because each dictionary has a separate code base, with minor differences among the code base for each dictionary. There are advantages to these separate code bases, but the disadvantage is that implementing a 'global' change is awkward. Such a global change was done when making the titles uniform, and the templating technique used there probably could be adapted to making other global changes to B,L,A displays. But, as already mentioned, this is currently awkward, so I have been loathe to engage with it.
"templating technique used there probably could be adapted to making other global changes to B,L,A displays" - we are nearing 1998 :) So let there be IAST in every house. A http://www.sanskrit-lexicon.uni-koeln.de/scans/awork/apidev/sample/list-0.2.html link would be enough for now.
@funderburkjim let's try to reproduce what http://spokensanskrit.de/index.php?beginning=0+&tinput=tata&trans=Translate&direction=SE has. Many of my pupils use the website just because it's insensitive mode to vowels and consonants. You do not need to know how exactly does the word should look like in advance.
kar
Did you mean one of the following words : (WORK IN PROGRESS REGARDING THIS FEATURE - SORRY FOR THE INCONVENIENCE!) kSar क्षर् karI करी kara कर kare करे kari करि kir किर् kr क्र् kur कुर् kArA कारा kAra कार
siva
siva ziva zivA
ganga
gAGga gaGgA gaGga
asana
asana asanA AsanA Asana azana Azana azanA
a A i I u U R RR lR lRR e ai o au M H k kh g gh G c ch j jh J T Th D Dh N t th d dh n p ph b bh m y r l v z S s h
@drdhaval2785 would you agree?
Proposal draft:
a=A
i=I
u=U
r=R=RR
l=lR=lRR
e ai o au
h=H
M=n=N=J=G
z=S=s
b=v
k=kh
g=gh
c=ch
j=jh
T=Th=t=th
D=Dh=d=dh
p=ph
b=bh
y r v
@juhnowski if I can ask, this would be the first task to update in the UI.
Hello! I push first implementation of @gasyoun "Proposal draft" https://github.com/juhnowski/word_variations Could you please review it. Next step - integration to existing UI. Could you please tell me in what place (page) I need do it?
What are you thinking about dividing one word to much more with the sandhi rules? For instance, implements Bühler's table of sandhi rules?
@funderburkjim Could you please help me with code integration?
@funderburkjim was https://github.com/sanskrit-lexicon/Cologne/issues/8#issuecomment-94280465 ever integrated? Please tell me. And @juhnowski code is ready, it's a replica of http://spokensanskrit.de/ search function. After that will need to find a way to display articles from several dictionaries at once. In one window. Is it possible, Jim?
help me with code integration ?
I'm unsure what you are asking of me. Can you be more specific?
I create working prototype on github.io: https://juhnowski.github.io/word_variations/
@juhnowski Just had a conversation with @gasyoun , so have a better idea of what this project is about.
Here's a suggestion: Make a 'live' version of your word variation project. You could serve it using GitHub Pages.
This version would be very simple. It would have input text field where user would type a spelling (like shiva, vishnu, etc.). When user clicks a submit button (or presses enter key, whichever), the program would display the list of word-variations that are generated for the input (shiva, siva, ...).
Seeing this simple display will help me understand what I need to do on the backend.
I'll work on a first approximation of this backend call; then, you can make a second version of your program which will make a call to the backend. (of course, I'll have to tell you the calling details before you can complete this second step.).
Does this sound like a good plan for making progress?
@juhnowski Great - just noticed that you've already got step 1 done.
@gasyoun You should check this, to be sure it's doing what you expected.
@funderburkjim as expected.
Let's assume that for the next step, you will send to my backend program (not yet written) a JSON object with two fields:
As a first step, my backend program will examine in the 'dict' dictionary at Cologne; for each word in the 'words' field, it will
@juhnowski How does this API specification sound?
@funderburkjim so for shiva
we get:
shiva
shivA
shiba
shibA
shIva
shIvA
shIba
shIbA
sHiva
sHivA
sHiba
sHibA
sHIva
sHIvA
sHIba
sHIbA
ziva
zivA
ziba
zibA
zIva
zIvA
zIba
zIbA
That's quite a good start. The sH
make no sense here, but will help in a word like duhkha
which is quite common. What if śiva
is Śiva
or even çiva
or Çiva
in the text? Will we find it? In XML of PWK it's even C2iva
, so until AS is not dead, our algorithm should be able to find even C2
, I guess.
Here's an implementation of api at cologne, along the lines described above.
It is put as a GitHub pages web-app, at http://funderburkjim.github.io/cologneapiwork/easyspell/01/
The html file illustrating the api , as well as a copy of the php script used at Cologne, are here
That html file uses a hard-coded list (the one Marcis shows above for 'shiva') as the input.
@juhnowski Is this what you need for now?
Good morning! @funderburkjim Is this what you need for now? - Yes, thank you.
@juhnowski
I would add a few elements from Velthius as well (only for input, not for variant generation):
aa=A
ii=I
uu=oo=U
.r=R
.rr=RR
.l=lR
.ll=lRR
.m=M
.h=H
From ITRANS (only for input, not for variant generation):
v=w
https://github.com/shreevatsa/padyachandas and https://github.com/shreevatsa/sanskrit/blob/master/transliteration/detect.py contain interesting ideas. @juhnowski are you still around?
@gasyoun Are we ready to start with a dev version of list-0.2.html that has a 'simple' input method based on the method developed thus far? Or do you want to do more preliminary work first?
Are we ready to start with a dev version of list-0.2.html that has a 'simple' input method based on the method developed thus far?
@juhnowski got lost and it was his turn.
Or do you want to do more preliminary work first?
I'm not sure what you mean. You showed what the server turns back, but we never heard back from Ilya. I reminded him a few times, but he was busy.
By 'preliminary work': I meant, are you satisfied for now with the list Ilya generates for 'Vishnu', etc.
If so, I can try to integrate that technique into the display logic.
I meant, are you satisfied for now with the list Ilya generates for 'Vishnu', etc.
Yes, but I guess I would want some order of priority. Not all are equal.
Like all those variants where v
can be b
- I would add a lower index and even if there is such a word, show it lower. Or no need for such sophistication?
It sounds like we're ready to try the experiment with a dev version of list-0.2 display.
Adding to todo list.
Once we get a live experiment, we will know better how to add refinements such as you mentioned.
Once we get a live experiment, we will know better how to add refinements such as you mentioned.
Agree.
Output is rather advanced and I have no issues as of it. But the input methods are rather limited. HK, SLP1 are good to have. Indians might even still use ITRANS. But we can't afford not to have IAST. And as you already know Indians really need Devanagari input (like it's done at http://www.sanskrit-lexicon.uni-koeln.de/monier1/webtc1/index.php but as you are aware does not supports copy-pasting, so you have to re-type every word, can not just check in it after copying from another source). What I would think about: 1) add IAST https://github.com/gasyoun/nagari/blob/master/SLP1_IAST.vbee 2) devanagari input (List display, http://www.sanskrit-lexicon.uni-koeln.de/monier1/webtc1/index.php) 3) detect encoding https://github.com/sanskrit/detect.py available as JS as well.