sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Much obliged #69

Closed LNS1 closed 8 years ago

LNS1 commented 9 years ago

I am much obliged to the gentlemen and ladies who run this site. Evidently this library is the Urquelle for most of the Stardict compatible Monier Williams dictionary files that are available on the Net. May Fortune smile on you fellows :)

Now is there similarly the Stardict (or the underlying source) files for the Cologne Tamil Lexicon too?

Srinivasa

gasyoun commented 9 years ago

Yes, Srinivasa, it's the source indeed. Tamil Lexicon was never made open for download, that is why there is no Stardict compilation based on it. Only @funderburkjim might know why.

funderburkjim commented 9 years ago

There is a Cologne website with some materials related to Tamil. These are based on digitizations that Thomas Malten made, and I helped Thomas develop this fairly primitive site in 2010-11. The site is http://tamildictionaries.uni-koeln.de/ .

Thomas also digitized a number of Tamil texts, and in 2011 there was a web program to access these, but it seems not to be functioning at the moment.

Thomas is retired and difficult to reach these days, but I'm sure he would be glad to share material with someone interested in, for instance, making a stardict version of the Tamil Lexicon.

If you, Srinivasa, have such an interest, let me know and I can try to prepare some of these materials for you.

Meanwhile, I'll send a Email to Thomas, and see if he provides a response.

LNS1 commented 9 years ago

Jim,

Yes, I am interested in creating a Stardict version of the Cologne Tamil Lexicon. Today I have a Stardict running on my pc (with Apte, MW, Vacaspatyam, Sabdakalpadruma, Amarakosa etc). Thanks in great part to the good people of Cologne who have generously made these files available in digital form. In other words thanks to you, Jim, to Marcis and several of your colleagues and last but not the least, to Prof Malten. It's hard indeed to express gratitude to people who share the fruit of their efforts.

The Cologne Tamil Lexicon would be a great addition to the resources in my Stardict instance. If it's not too much trouble, could you share the necessary digital files? However I dont have ocr capability so scan images are not very useful to me.

Thanks and Warm Regards,

Srinivasa ​

On Thu, May 21, 2015 at 3:18 PM, funderburkjim notifications@github.com wrote:

There is a Cologne website with some materials related to Tamil. These are based on digitizations that Thomas Malten made, and I helped Thomas develop this fairly primitive site in 2010-11. The site is http://tamildictionaries.uni-koeln.de/ .

Thomas also digitized a number of Tamil texts, and in 2011 there was a web program to access these, but it seems not to be functioning at the moment.

Thomas is retired and difficult to reach these days, but I'm sure he would be glad to share material with someone interested in, for instance, making a stardict version of the Tamil Lexicon.

If you, Srinivasa, have such an interest, let me know and I can try to prepare some of these materials for you.

Meanwhile, I'll send a Email to Thomas, and see if he provides a response.

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-104393174 .

funderburkjim commented 9 years ago

I've sent a note to Thomas, and hope to first get his response (I referenced this issue).

If I don't hear from him in a week or so, I'll just plan to go ahead and get the materials. Remind me in a week or so if need be.

funderburkjim commented 9 years ago

@gasyoun Maybe the best way to make these Tamil resources available would be via a repository under this sanskrit-lexicon project. What do you think?

funderburkjim commented 9 years ago

srinivasa -- Have you documented (in a Github repository or elsewhere) how you create a stardict version of the Cologne dictionaries? Also, how to use the stardict versions? It would be useful to have these technical details documented.

gasyoun commented 9 years ago

@funderburkjim I guess a repository under sanskrit-lexicon makes more sense than under your or my personal account. Vishvas might know the full process of converting to Stardict https://groups.google.com/forum/#!topic/sanskrit-programmers/i_pcVJn7_rM - it's good for Android as well for offline use. The only question I have is regarding the updates and error submission - there is no way to collect errata.

LNS1 commented 9 years ago

I've used the files in the aupasana site (http://www.aupasana.com/stardict). I'm assuming their source is the Cologne site. I can document how to use the Stardict versions for pc and mac. I've not converted any files so far. I'm sure I can after I actually do a conversion.

Thanks and Regards,

​Srinivasa​

On Thu, May 21, 2015 at 3:54 PM, funderburkjim notifications@github.com wrote:

srinivasa -- Have you documented (in a Github repository or elsewhere) how you create a stardict version of the Cologne dictionaries? Also, how to use the stardict versions? It would be useful to have these technical details documented.

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-104402347 .

funderburkjim commented 9 years ago

Have received confirmation from Thomas: "by all means, let the Tamil/ Sanskrit stuff be available to whoever wants to use it."

funderburkjim commented 9 years ago

Thomas also mentioned "I had a request for the same Tamil/Sanskrit data from a Sinhala group and will forward their mail to you." Here is that email:

A group of us have been using the dictionaries given in the following location: http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html (webapps.uni-koeln.de/tamil/)

Before it appeared on the Internet, we had the downloadable version. We are aware of the new interface for the dictionaries at: http://www.sanskrit-lexicon.uni-koeln.de/

It concerns us that the webapps.uni-koeln.de/tamil/ is deprecated and perhaps billed to be taken down. In that case, please offer the files and the background programs to us so that we can publish it elsewhere. Thank you.

In our humble opinion, it is much superior to others in the manner it takes input. First, it has a compact, uncluttered interface without extraneous distractions. Then it has the Sanskrit soDI chart just below the search form. The feature that is most useful for us is that it takes input without regard to whether a letter indicates a mUrdhaja or not. This is important for the Singhalese because Singhala people are mUrdhaja challenged as opposed to Indians whose signature accent is owing to it.

As you know, Singhalese intermix Sanskrit with pure Singhala and try to be faithful to the original spelling with one deviation from Devanagari spelling in words like kAryAlaya, (or karNa) spelling as kAryyAlaya, (or karNNa), adding a yaMzaya, (or Na) in order to mount the rephaya.

Our dream is in your continuation of the page and adding Singhala to the result display as well. It can be easily done with Romanized Singhala formatted with an orthographic smart font as seen here: http://lovatasinhala.com/

In that web site, we show how Singhala / IAST / HK can inter-convert with simple JavaScripts. http://www.lovatasinhala.com/restrict/liyanna.php#sanskrit Please click the third button under each edit box to get the text in the other two schemes. If you like our idea of adding Singhala to your app we will give our best support to do it.

Again we passionately plea that you preserve the (webapps.uni-koeln.de/tamil/) web application. Thank you for your continued interest in Sanskrit and maintaining your position as world's premier authority on the subject.

funderburkjim commented 9 years ago

Although Srinivasa's interest and that of the other email both pertain to Tamil resources that Thomas provided at Cologne, the focus of the interest is different. I mention it here just so interested parties can be informed.

gasyoun commented 9 years ago

Never knew about the mUrdhaja issue. See, Jim, how important it's to have several ways to enter keywords? Just this small feature can change just everything.

LNS1 commented 9 years ago

Does the Sinhalese gentleman mean 'murdhanya' perhaps? I concur with him on one thing however. The Cologne Lexicon user interface is a thing of beauty. Form and function are so beautifully intertwined that I cant find a comparison. Suffice it to say that if your webpage were a goalie it would be none other than Sepp Maier himself.

Srinivasa

Sent from my Verizon 4G LTE Smartphone On May 27, 2015 6:13 PM, "Marcis Gasuns" notifications@github.com wrote:

Never knew about the mUrdhaja issue. See, Jim, how important it's to have several ways to enter keywords? Just this small feature can change just everything.

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-106095139 .

naena commented 9 years ago

Yes, it is I (the Singhalese) who wrote that message that Mr. Thomas Malten so kindly published here. Hello Srinivasa and Funderburkjim.

I just want that page now marked as deprecated preserved.

Yes, Murdhanya is sort of the common word used by Buddhist monks for mUrdhaja. May be because of some relation to Pali? Singhala is similar to Hindi on the surface, but has two significant differences from Indo-Aryan and Dravidian. It has cognates that are exclusively either with Indo-Aryan (many) and European (few but strangely most fundamental to language like water and sound). This suggests that IE arrived in the island may be independently from India, counter to the commonly held belief that it arrived from India. The fact that they cannot differentiate between mUrdhaja (retroflex) from dantaja (dental) and their not having any aspirants (mahaprANa) makes them stand apart from India. However, from all other respects, Singhala is as Indian as the Indian languages. They mix in Sanskrit in speech without altering, at least pedantically.

Please read my next message for my request.

naena commented 9 years ago

What I want done is to add Singhala just like Devanagari to the form. It is romanized Singhala (RS) supported by an orthographic smartfont.

I have the Javascripts to convert between HK and RS and the font to be placed on the server or served as a web font like this: @font-face{font-family:aruna; src:url(http://smartfonts.net/woff/aruna.woff) format("woff")} .sing{font-family:aruna,sans-serif;text-rendering:geometricPrecision;font-feature-settings: "liga" 1;}

So, if Thomas wants to try it, I can supply the Javascripts.

Thanks.

JC

naena commented 9 years ago

I think I am mixed up here, sorry. I just noticed that the old page does not have anything other than the Latin script. In that case, my request is to show Singhala in the newer pages. I still think that relaxing input to case insensitive would be much easier for those who do not know the difference between [T, D, N] and [t, d, n].

Thanks

JC

gasyoun commented 9 years ago

@naena can you supply the .JS? As per case insensitive - understood. Would case sensitiveness help as an option? Are you ready to code all that is needed and submit it to github, so Jim has to only upload it? That would be the best. I can add you as a collaborator here and you upload the HTML and JS files needed.

LNS1 commented 9 years ago

I understand about 'murdhaja', Naena. The common Sanskrit (and Hindi) term for retroflex is 'murdhanya' although I can see that the Sinhala term for it is 'murdhaja'. In Sanskrit, the term 'murdhaja' stands for the hair on the head :) Likewise, in modern Indian languages too, many Sanskrit words mean something totally different.

Srinivasa ​

On Thu, May 28, 2015 at 1:43 AM, naena notifications@github.com wrote:

Yes, it is I (the Singhalese) who wrote that message that Mr. Thomas Malten so kindly published here. Hello Srinivasa and Funderburkjim.

I just want that page now marked as deprecated preserved.

Yes, Murdhanya is sort of the common word used by Buddhist monks for mUrdhaja. May be because of some relation to Pali? Singhala is similar to Hindi on the surface, but has two significant differences from Indo-Aryan and Dravidian. It has cognates that are exclusively either with Indo-Aryan (many) and European (few but strangely most fundamental to language like water and sound). This suggests that IE arrived in the island may be independently from India, counter to the commonly held belief that it arrived from India. The fact that they cannot differentiate between mUrdhaja (retroflex) from dantaja (dental) and their not having any aspirants (mahaprANa) makes them stand apart from India. However, from all other respects, Singhala is as Indian as the Indian languages. They mix in Sanskrit in speech without altering, at least pedantically.

Please read my next message for my request.

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-106183466 .

funderburkjim commented 9 years ago

Srinivasa: Here is a dropbox link to otl.txt, a tamil lexicon that forms the basis of the Tamil word lookup in the legacy display program http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html

I do not currently know if this is the same or different from the data used at http://tamildictionaries.uni-koeln.de/.

This may be enough to get you started - I'm not sure.

funderburkjim commented 9 years ago

Naena -

  1. Currently, the programs which generate the displays (including the Tamil dictionary part) at http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html are Perl programs, written about 15 years ago by someone else, and lightly modified by me about 7 years ago. If we add font support to this display, these programs are where we must start. I've spent some time today towards getting these programs to work on my local Windows installation of XAMPP. I am at the point of realizing that the details of the searches depends on the data files being input into MYSQL. I'm thinking of switching to SQLITE, which will be more portable and may provide the same search functionality.

Have you looked at http://tamildictionaries.uni-koeln.de/ ? In these displays, Tamil script can be shown. Is this what you are after?

Do you have programming experience? If so, which language(s)?.

LNS1 commented 9 years ago

Thank you, Jim. Let me take a look.

Srini

On Thu, May 28, 2015 at 6:33 PM, funderburkjim notifications@github.com wrote:

Srinivasa: Here https://dl.dropboxusercontent.com/u/29859999/otl.zip is a dropbox link to otl.txt, a tamil lexicon that forms the basis of the Tamil word lookup in the legacy display program http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html

I do not currently know if this is the same or different from the data used at http://tamildictionaries.uni-koeln.de/.

This may be enough to get you started - I'm not sure.

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-106618743 .

naena commented 9 years ago

Actually, I have gotten everybody confused, because I talk too much, sorry.

What I am talking about is Sanskrit. I would like to have the keyword displayed also in Singhala. I am not a regular programmer, but can handle HTML, CSS, PHP and JavaScript fairly okay by reading docs for problem at hand.

Please look at the attached screenshots.

In the search page, I have selected Devanagari Unicode. The result page gives the resulting key word in Devanagari.

I want to add Singhala into that list on the search page. That is not Tamil, but the language spoken in Sri Lanka. Unicode Sinhala is not compatible with Sanskrit but romanized Singhala is. (I purposely spell the name with a 'g' to differentiate between Unicode and Romanized versions). Romanized Singhala is ordinarily displayed using an orthographic smartfont. What you see here is romanized Singhala: http://www.lovatasinhala.com (About 20% of modern Singhala is freely borrowed Sanskrit)

I do not know how you display Devanagari on your pages. How I show Singhala is by shipping the font as a web font defined in CSS like this:

@font-face{font-family:singfont;src:url(http://smartfonts.net/woff/aruna.woff) format("woff")} .sing{font-family:singfont;text-rendering:geometricPrecision;font-feature-settings: "liga" 1;}

Inside HTML, you evoke the class 'sing' to show Singhala, doing something like this:

karaNa

*****_HK to Romanized Singhala**_*** function hk2rs() { var a = inputHK; a = a.replace(/AM/g, "aá"); a = a.replace(/IM/g, "ií"); a = a.replace(/UM/g, "uú"); a = a.replace(/aM/g, "á"); a = a.replace(/iM/g, "í"); a = a.replace(/uM/g, "ú"); a = a.replace(/eM/g, "é"); a = a.replace(/oM/g, "ó"); a = a.replace(/AH/g, "aä"); a = a.replace(/IH/g, "iï"); a = a.replace(/UH/g, "uü"); a = a.replace(/aH/g, "ä"); a = a.replace(/iH/g, "ï"); a = a.replace(/uH/g, "ü"); a = a.replace(/eH/g, "ë"); a = a.replace(/oH/g, "ö"); a = a.replace(/lRR/g, "ôô"); a = a.replace(/lR/g, "ô"); a = a.replace(/R/g, "û"); a = a.replace(/A/g, "aa"); a = a.replace(/I/g, "ii"); a = a.replace(/U/g, "uu"); a = a.replace(/G/g, "ñ"); a = a.replace(/J/g, "ç"); a = a.replace(/t/g, "þ"); a = a.replace(/d/g, "ð"); a = a.replace(/T/g, "t"); a = a.replace(/D/g, "d"); a = a.replace(/N/g, "N"); a = a.replace(/S/g, "x"); outputRS = a; } On 5/28/2015 5:43 PM, funderburkjim wrote: > Naena - > 1. Currently, the programs which generate the displays (including the > Tamil dictionary part) at > http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html > are Perl programs, written about > 15 years ago by someone else, and lightly modified by me about 7 years > ago. If we add font support to > this display, these programs are where we must start. I've spent some > time today towards getting these > programs to work on my local Windows installation of XAMPP. I am at > the point of realizing that the > details of the searches depends on the data files being input into > MYSQL. I'm thinking of switching to > SQLITE, which will be more portable and may provide the same search > functionality. > > Have you looked at http://tamildictionaries.uni-koeln.de/ ? In these > displays, Tamil script can be shown. > Is this what you are after? > > Do you have programming experience? If so, which language(s)?. > > — > Reply to this email directly or view it on GitHub > https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-106620038.
naena commented 9 years ago

Just noticed dropbox links shared. (I am learning) Here are the links to the screenshots I tried to attache earlier: https://www.dropbox.com/s/a0wioolo6iggjt8/search.png?dl=0 https://www.dropbox.com/s/nfriimd5y1n8u1t/result.png?dl=0

On 5/28/2015 5:43 PM, funderburkjim wrote:

Naena -

  1. Currently, the programs which generate the displays (including the Tamil dictionary part) at http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html are Perl programs, written about 15 years ago by someone else, and lightly modified by me about 7 years ago. If we add font support to this display, these programs are where we must start. I've spent some time today towards getting these programs to work on my local Windows installation of XAMPP. I am at the point of realizing that the details of the searches depends on the data files being input into MYSQL. I'm thinking of switching to SQLITE, which will be more portable and may provide the same search functionality.

Have you looked at http://tamildictionaries.uni-koeln.de/ ? In these displays, Tamil script can be shown. Is this what you are after?

Do you have programming experience? If so, which language(s)?.

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/Cologne/issues/69#issuecomment-106620038.

funderburkjim commented 9 years ago

@naena - OK, I was definitely confused as to what you are after, but think I understand it now.

This deserves to be in its own issue, as it is not related to Srinivasa's issue.

gasyoun commented 8 years ago

I guess work has stopped. It's dead.