sanskrit-lexicon / GreekInSanskrit

Provide missing Greek text for the Cologne digitizations of Sanskrit dictionaries.
0 stars 1 forks source link

Greek in MW: review #2

Closed funderburkjim closed 2 years ago

funderburkjim commented 9 years ago

Of the 34 dictionaries on the Cologne Sanskrit-Lexicon home page, 16 have Greek text that needs to be provided.
In one of these (the 1899 Monier-Williams Sanskrit-English dictionary), Greek text has been provided. I thought it might be useful to review this before preparing materials for Jonathan to work with.

The real work of coding the Greek in MW was done in 2007, and is described in a note prepared by me in 2010. One salient point of this work was that the Greek was coded in the beta transliteration.
Coding Greek in Beta appears to be quite analogous to coding Devanagari in one of the Sanskrit transliterations (such as Sanskrit Library Phonetic (SLP1), HK (Harvard-Kyoto) and others). Both are ways to code a language with non-English letters by using only the basic ASCII character set.

When Greek is coded in BETA, it needs to be transcoded to Unicode for proper representation.
This web page was prepared by me to illustrate this transcoding, and to provide a work space where the accuracy of the transcoding from BETA to Unicode code be evaluated.

A third web document itemizes all of the 711 headwords of the MW dictionary where Greek appears.

@jmigliori Let me guide you through this web page, as it illustrates several aspects of what you'll be working with when you tackle the missing Greek text in other dictionaries.

Now click the MW key link, for the first row. अ (first letter in Devanagari alphabet - like 'a'). Also keep in mind the L value 4 for this row. You see several lines, the first of which is

(H1) अ 1 [p= 1,1] [L=1]   the first letter of the alphabet

and the fifth of which is

(H1) अ 3 [p= 1,1] [L=4]   (before a vowel अन्, exc. अ-ऋणिन्) , a prefix corresponding to Gk. ἀ, ἀν, 
   Lat. in, Goth. and Germ. un, Eng. in or un, and having a negative or privative or contrary sense 
   (अन्-एक not one ; अन्-अन्त endless ; अ-सत् not good ; अ-पश्यत् not seeing)

Note in particular the presence of Greek ἀ, ἀν and that L=4.

There are two links in this L=4 line.

The digitization of the Greek was originally done (by Wendy Teo) essentially working backwards: look at the Greek text in the scanned image, and type in its representation at the appropriate spot in the digitization of MW. Your workflow for Greek in the other dictionaries will likely be similar.

Ok - that's it for examining the MW page for L=4.

Back to the first row of the Greek in MW display. The links from the Greek in the last column are the same links to Perseus.

OK - that's the end for walk through of the Greek in MW display.

TODO

Jonathan, I think you should spend some time examining the Greek in MW display. Here are some questions that are unclear to me that maybe you can help resolve before we start with the other dictionaries.

I think we can get started with another dictionary once the BETA/Unicode choice is made. I'm also going to solicit Peter's opinion on this, since he was involved in the coding of Greek in MW.

gasyoun commented 9 years ago

After MW I guess PW is a good candidate. I've seen a lot of non-Unicode Old Greek, but never in BETA, so hope it can be left in the past. The Perseus question remains the most intriguing one for me as well.

jmigliori commented 9 years ago

•I do know BETA from using the word lookup tool on Perseus. I’m working on a Mac, so entering the Greek in Unicode is very easy to do and preferable.

•I had a cursory look at the MW Greek and I didn’t see any errors in the Unicode representation. I can take a closer look later.

•This Perseus issue is tricky. The word linked to, ὄκταλλος, doesn’t have its own dictionary entry because it’s a dialect form only listed under another rare word (ὄκκον). Ὄκκον appears when you manually go to its entry in the Greek dictionary, but not when you click the link for it under ὄκταλλος in the Word Study Tool. I don’t know why this is, aside from Perseus being a sprawling database that’s prone to bugginess.

With that addressed, I think I’m ready to start.

gasyoun commented 9 years ago

That means that Perseus is badly interlinked, understood. After the manual checking - should we add a link to ὄκκον instead of the non-working link to ὄκταλλος?

jmigliori commented 9 years ago

Sure

On Apr 16, 2015, at 3:14 AM, Marcis Gasuns notifications@github.com wrote:

That means that Perseus is badly interlinked, understood. After the manual checking - should we add a link to ὄκκον instead of the non-working link to ὄκταλλος?

— Reply to this email directly or view it on GitHub.

funderburkjim commented 9 years ago

Interesting idea to add link to ὄκκον.

The accomplishment of this might be broken into three steps for MW:

This enhancement is a secondary goal at present for Jonathan - the first goal is the coding of Greek in the other dictionaries.

@jmigliori I'm glad you found a good explanation for the Perseus link problem, and that the Unicode representation looks correct.

gasyoun commented 9 years ago

@funderburkjim my Greek is worse than my Spanish, so I would prefer to say no, the more that's it's secondary indeed. 711 cases would take a week or so.

funderburkjim commented 9 years ago

Actually, knowledge of Greek is not needed for step 1.. I don't know Greek, but by clicking was able to see that the link to ὄκταλλος leads nowhere.That's all that is needed for step 1. It's up to Jonathan to find meaningful links for the nowhere links.

funderburkjim commented 9 years ago

Maybe I could make a form, like the MW in Greek display with those 711 cases, but with an extra field that would have three radio buttons: * OK * PROBLEM * TODO next to the Perseus links. Then any of us could classify a few links when we have a few minutes, and the results would be saved in a database when we mark 'OK' or 'PROBLEM'.
That way, the initial screening for problematic links to Perseus would not be too onerous a task for anyone (since several people could contribute) and in a month or two we would have a list of PROBLEM cases for @jmigliori to examine.

Does this sound like a useful approach? Would anyone participate in this if I make such a form?

jmigliori commented 9 years ago

That sounds like a good plan. I can certainly chip away at that list between working on the other dictionaries.

On Thu, Apr 16, 2015 at 10:47 PM, funderburkjim notifications@github.com wrote:

Maybe I could make a form, like the MW in Greek display with those 711 cases, but with an extra field that would have three radio buttons: * OK * PROBLEM * TODO. Then any of us could do a few when we have a few minutes, and the results would be saved in a database when we mark 'OK' or 'PROBLEM'.

That way, the initial screening for problematic links to Perseus would not be too onerous a task for anyone (since several people could contribute) and in a month or two we would have a list of PROBLEM cases for @jmigliori https://github.com/jmigliori to examine.

Does this sound like a useful approach?

— Reply to this email directly or view it on GitHub https://github.com/sanskrit-lexicon/GreekInSanskrit/issues/2#issuecomment-93882189 .

gasyoun commented 9 years ago

@funderburkjim if that does not take longer than 20 minutes for you, than it's a good idea.

gasyoun commented 9 years ago

@funderburkjim does Jonathan has a plan of what he could do next, if he had spare time?

funderburkjim commented 9 years ago

From Jonathan's comment elsewhere, I think he's had to attend to job or school (I'm not sure whether he's a student or teacher), and will resume the MW72 Greek when his time permits.

gasyoun commented 9 years ago

Sure, but have we the other 14 dictionaries ready for him or it would take another week for you to prepare them? How hard is your part? When you uploaded the batch correction pages it looked easy - easy when you do it, but would take months if I would - without automation.

funderburkjim commented 9 years ago

As usual, there's an app for that! Uploading batch correction pages is done by a program, courtesy of those friendly Pythons. So, when the time is ready, the rest of the batches for MW72 can be uploaded.

As to the other dictionaries that have Greek, a similar process will probably work, with slight adjustments for idiosyncracies of each dictionary.

gasyoun commented 9 years ago

Where is the app?

Andhrabharati commented 2 years ago

All the greek text in all the cdsl dicts are filled up now (though BEN is to be "done" by Jim yet), and the issue can be closed now.