sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Gasuns Dhatu Concordance (based partly on Cologne Sources) #52

Closed gasyoun closed 3 years ago

gasyoun commented 10 years ago

https://github.com/sanskrit-lexicon/PWG/issues/7 continued. I would love to see "work comparing verbs of MW with two other sources of verbs: Whitney Roots and Madhaviya Dhatu Vrtii." that has been already done. To proove I'm able to understand and give it some new life let me tell you my story in a nutshell. In 2004 I started entering Palsule's roots, so it's the basis of my concordance, then came EWA, VIA, Bucknell, Zlaiznyak & Lihusina. Huet & Panini I did not enter myself, nor did I took it from Cologne. Cologne I used for PWK.

find_dhatu

Palsule 1955 https://yadi.sk/i/17Mio43qbeRNP PWG 1855-1875 http://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/web/index.php PWK 1879-1889 http://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2014/web/webtc/indexcaller.php ideally integrated with SCH http://www.sanskrit-lexicon.uni-koeln.de/scans/SCHScan/2014/web/webtc/indexcaller.php Whitney 1885 https://yadi.sk/i/gxGQLIGwbgScE EWA 1986-2001 https://yadi.sk/i/Q2VZVgpnbgSXW VIA I 1997 https://yadi.sk/d/WUf_n0XibgSTi

werba

I do not know if SCH contains roots additional to PWK, let me check it for 12427 º entries (aDarIkf 1662 - "ºnach unten bringen , demütigen"; karburIkf 9995 "sprenkeln , buntmachen" does not counts; kalmAzay 10166 "buntmachen"). So most of the lists (there are around 10 more additional smaller lists, none of them at Cologne and I guess never will be because they are not dictionaries at all) come from non-Cologne. But there is the PWK case - it is the single most interesting case and not only my dhatu concordance hardly relies on it, but even in a much bigger way my Reverse Dictionary of Sanskrit. So I took Palsule's list of dhatus and searched for them in PWK, Whitney, EWA, VIA. It took me about half a year to identify issues and more half year to find the guilty ones, including data entry, corrigenda, Excel macros. In 2012-2014 I've done a lot of drafts based on the data, like reverse sorting Palsule, for example https://yadi.sk/i/YZjGa4qbbgdzn (Reverse-Palsule-Dhatu-Index.pdf) For example, I enlisted several parameters: ATI ATE OTE DHATUP onomatop. Sautra Wurzel caus. v.  l. v.l. perf. med. mit von Vgl s. s. u. The higher the number (max. 11) the more chances we have a dhatu before us, but actualy even with 0 I have found 3052 candidates (without even -ay ones) in PWG.

dhatu-gasuns-v13-25-12-13

As per https://yadi.sk/d/_YOAUZyObeRbj Gasuns-Dhātupāṭha-Concordance.pdf

palsule-legend

Out of 18457 cells 12898 are empty; 2173 without a translation (German or English); 3386 with a translation. I try to compare every list between all the other lists. But in this version Palsule is the default list. Palsule contains all Panini roots and all possible Indian editions of dhatupathas.

funderburkjim commented 10 years ago

The link https://dl.dropboxusercontent.com/u/29859999/mwdpsummary-marcis.zip has some information related to the MW-Madhaviya correlation work, which seems to have been done mostly in 2008-2010.

The mdp.xml therein is primarliy Peter's work, with my programming assistance. You should check with Peter before sharing this with others.

The matchman2.html file was constructed from the matchman2.txt file, and contain the MW-MDHP correspondence results.

The readme file seems to describe the fields of the matchman2 files, though it will likely be obscure in some places. I was primarily responsible for these files.

The creation of these files followed a rather complex path. Possibly, Whitney's roots, Westergaard's Dhatu Patha, and Katre's Dhatu Patha came into play. But the primary inputs were mdp.xml and an extraction of verb records from MW.

The link https://dl.dropboxusercontent.com/u/29859999/mwwhitmap.zip provides the map between MW roots and the entries in Whitney's roots; it is the basis of the Whitney links in the MW dictionary displays at Colonge Sanskrit-lexicon. The creation of this used Scharf-Hyman's digitization of Whitney's roots. The correspondence work was done by me, basically using root spelling and sometimes comparing Whitney's gloss to MW definitions.

I hope these provides some interim value.

Currently, I am working on reviewing and simplifying the 'mapnorm' project.

gasyoun commented 10 years ago

Questions after reading "readme-20100604.txt" Huge work and unpublished - how come? Interesting to know that we have build similar comparisons at the same time, only I compared mostly Palsule with PWK. Palsule has Panini as one of his sources and many other Indian dhatupathas in the list as well. PWK is the source of MW in many cases. As per mdp.xml - understood, Peter ueber alles. My .zip edition seems to contain 1/8 of the files in actual use. 1) "listing of the Westergaard sutras was developed", where is WestergaardDhP1.xml file? Where mwtab1/Dha1tupd.txt? redo.bat? Otherwise wsid (Westergaard identifier) is only for Knights Templar? 2) mdp/norm-mw.txt, there are 1823 different values of key2 - so 1823 is your supposed number of MW roots? And that includes some verbs, that where not marked as roots by MW himself, but based on some correspondence with MDP, right? 3) DP in column header = mdp in other places, right? 4) "some cases, such as 'Ap', MW has a causal , and 'C' is shown as one of the mw classes." are cases that Jim found marked somehow different from those, that where just extracted with a regex based on root tag? 5) "set of English definitions was developed for each mdp sense" - Peter's part? sensemw1.tx we'll never see, right? As per "convert the Sanskrit senses into English, by using the MW dictionary" seems it's Jim. 6) "set of simplified definitions of the MW roots" - same was done for PWK, PWG by me as well, but needs further perfection. In this regards sensemap1.txt is of greatest interest - comparison of senses I've never even tried, what was the algo? Just similar words? 7) since Westerguaard -> since Westergaard 8) "background color of records is also adjusted to assist the eye. Groups Y1 and YG have white background, groups NOMW1, NOMW2 and NOMDP have a grey background. NNOM has a green background. The rest have a tan or yellow background." - is lost as only red font for mismatch is used. 9) "If the mw headword and the mdp normalized root agree in spelling, then only the common value is shown in the first column; when there is a difference, both are shown, with the mdp root appearing in parentheses and in red text." - great way, but becomes puzzling when more sources are compared at once between each other. 10) So 'artificial' = Jim (font size not larger than other entries), or 'genuine' = Monier (as printed with larger font in book), right? Both coded in mw.xml equal nowadays? I guess no. It's related to "191 mw records were reclassified as roots"? 11) regarding the "few miscellaneous comments" - should I read "wdp variant dp" Westergaard's root (WPD) to be an allomorph similar to MDP? And how should I understand "mw variant"? 12) where is key9/matchman2-keydiff.txt? 372 records of ')KD'? Is is where it starts to be interesting because of "record for which the mdp normalized root is spelled differently than the mw headword". Did you used some gunation script or all done by hands? 13) "corresponding mdp entries have the marker 'i'" - too bad it's unsortable. 14) where is mwverb/verb/add.txt? Something I've never done myself is inside "drop.txt contains 51 records which were previously classified as roots" - but is the Pandora's chest closed? 15) ceratin English -> certain English 16) "Since the correspondence, while explicit in the data structures, was subjectively developed, it is likely that other readers would develop the correspondences differently." - this is the main idea of my PhD as well. Too bad your work was never published as an article and I can't quote it. 17) "someone more familiar with Sanskrit might disavow certain sense equalities, and avow certain sense inequalities" - let there come Shilu, in 2017 or 2018, who knows. We are not the ones to decide. 18) Why the heck "NPG 32.23 10.023-01" so much data has been put in a single cell? Sortability above everything and here it's broken. 19) "mwwhitmap.txt" - great format "heW263975heW208" for 827 entries. What about the left out Whitey roots, tried and failed? What about homonyms and homonym numbers - when 1st homonym in Whitney matches the 2nd in MW?

It's great to know what has been going on. For now I need to understand if the missing files can be shared as well. If so, than I can propose my additions, changes, modifications one day. Thanks for your as usual detailed explanation. I must take an hour of your each day activity - hope I can return back one day.

funderburkjim commented 10 years ago

https://dl.dropboxusercontent.com/u/29859999/whit-mw-html.zip Link to displays showing correspondence between Whitney and MW roots. Although I haven't read all of my notes, as I recall, this correspondence was based on root-spelling and root sense. The relations displayed are summarized in the mwwhitmap.xml file mentioned in previous note.

https://dl.dropboxusercontent.com/u/29859999/mwverb.zip has files used as source information for roots from MW in the various root comparison. The 'genuine' list contains the verbs which appear in large devanagari in MW; the 'artificial' are the other mw verbs. The verb-prep4-root.out is extracted from MW records, and essentially contains all the elements for those which are roots (prefixed roots are not present.)

https://dl.dropboxusercontent.com/u/29859999/westwork.zip has WestergaardDhP1.xml and other Westergaard files. The 'sutras.txt' file was likely done by hand from the Westergaard pdf.

I never thought to publish this work, indeed I don't know if anyone has examined it other than you. If it were to be published, my approach would be, essentially, to understand what I did, and to describe the work in a more retrospectively linear form. I would use the various readme files as a guide; all the programs and code are present (I think) on my local computer - so a 'redo' is likely possible.

If you are really interested in re-examining the whole process by which the comparisons were derived, perhaps we can revisit (with Shalu ?) the whole process together sometime. Likely, this would have the beneficial result of making the correspondences more accurate. It could likely be extended to provide a comparison of PW roots, in a manner at least functionally distinct from what you may have done. Another comparison I think would be interesting is to Wilson, who show roots in DhatuPatha form (with anubandhas); similarly with Shabda Sagara.

The mdp file I sent is, as I recall, a sort of summary extracted from the rather more complicated MadhaviyaDhP3 (or 4).xml that Peter mentions. The extracted 'summary' essentially 'flattens' the structure, to make it easier to work with.

drdhaval2785 commented 3 years ago

Verbs being tackled elsewhere. There is a concordance for 16 dicts. Am I right @funderburkjim ? Can you provide the link? I can not locate it.

gasyoun commented 3 years ago

Verbs being tackled elsewhere

Did you mean https://sanskrit-lexicon.github.io/verbs/verbs01/verbs1_merge1_1vmw.html ?

drdhaval2785 commented 3 years ago

Yes. The same link.