`o` vs `O` Corrections in MW

gasyoun commented 9 years ago

https://github.com/sanskrit-lexicon/CORRECTIONS/issues/45 continued with a one year break. http://drdhaval2785.github.io/o_vs_O/output1/MW.html Highest probability (One dictionary in first word and more dictionaries in second word) first.

daRqAjinika -> dARqAjinika word grammaticaly related (base form daRq), meaning virdhization supports that they are related, but basing on the meaning the virdhization is lacking in the original (printed) MW form, so a factual print error.

darqajinika

dIpaKori -> dIpaKorI PWG quotes SKD, SKD links to dIpakUpI, where 2 dīpakhorī again is given, so with KorI and not Kori as in MW. MW quotes Lexicographers, that means Indian authors and has quoted wrongly. The printed MW has it right, it's an OCR error.

dipakori

zaaf2 commented 9 years ago

@gasyoun a factual print error.

Do you mean an OCR error? दाण्डाजिनिक in the printed edition seems correct to me. The word comes from दण्डाजिन -- with a short first a -- (“n. sg. staff and dress of skin as mere outward signs of devotion, hypocrisy, deceit Pāṇ. 5-2, 76”), which is a Dvandva compound (no vṛddhi here), from दण्ड m. staff + अजिन “n. the hairy skin of an antelope, especially a black antelope (which serves the religious student for a couch seat, covering &c”). दाण्डाजिनिक is formed by secondary derivation with the suffix –ika, which requires the vṛddhi-strengthening of the initial syllable. Cf. Whitney’s Sanskrit Grammar (1204 and 1222 j):

(...)

gasyoun commented 9 years ago

@zaaf2 thanks for the detailed answer with quoting, love the style. Do you know of https://en.wikisource.org/wiki/Page%3ASanskrit_Grammar_by_Whitney_p1.djvu/483 at https://en.wikisource.org/wiki/Sanskrit_Grammar/Chapter_XVII#418? Yes, OCR now I see it - I was looking and did not saw it before. At http://drdhaval2785.github.io/o_vs_O/output1/MW.html you can see line 50 daRqAjinika dARqAjinika दण्डाजिनिक दाण्डाजिनिक MW AP,PW,PWG,SCH,SHS,VCP,WIL,YAT.

gasyoun commented 9 years ago

3. dfptabAlaki -> dfptabAlAki 1st argument, MW was published after PWG and many words were "taken", but in many cases wtih same mistakes as in original or with new ones. 2nd argument, Dṛptabālāki is more popular form https://www.google.ru/search?q=d%E1%B9%9Bptab%C4%81laki&ie=utf-8&oe=utf-8&gws_rd=cr&ei=-oUPVrrpCIHSyAO_vLHQBw#newwindow=1&q=d%E1%B9%9Bptab%C4%81l%C4%81ki and MW's form is not met outside MW.

dfptabalaki

gasyoun commented 9 years ago

MW.html 4. deuliya -> deüliya (non o_vs_O) 1st, @funderburkjim, can we track all the 2 vowel following each other? I thought we did it before, but now I remember we did it only with 3 following consonants. 2nd, the ü should be at least in key2, because it's there in the book and is lost. If all the umlauts are lost in the OCR, that's a sad story. 3rd, at the end of the article, there is the word Kshitïṡ that contains ï - an Umlaut that is not there in the book, but that seems to be used there to denote MW sandhi markup. Is this system widely used, or only sparely, any clue?

deuliya

P.S. Prakrit -> Prākr. grāma -> Grāma (?)

gasyoun commented 9 years ago

5. devadutI -> devadUtI OCR error

devaduti

gasyoun commented 9 years ago

6. dyOSaMsita -> dyOsaMSita

dyosamsita

Should we ignore the original MW's Ṉ and use just ṁ instead?

gasyoun commented 9 years ago

7. dvadaSAra dvAdaSAra OCR error, not only the original is dīrgha, it is with accent as well, which is totally lost, @funderburkjim?

dvadasara

zaaf2 commented 9 years ago

6. dyOSaMsita -> dyOsaMSita

By the sense of the word (from √शो) there is clearly an error in the printed edition. But the correct form should be dyOzaMSita, since स् becomes ष् after vowel (except a/ā) if followed by vowel, त् थ् न् म् य् or व्

Does the digital edition output make no distinction between ṉ and ṃ in the original? I find under SaMsita:

SaMsita [p= 1044] : mfn. (often confounded with saM-Sita » saM- √So) said, told, praised, celebrated Pañcat. praiseworthy ib. [L=210855]

The distinction is clear only in the printed edition:

Is there no way to make the distinction in the digital edition, even when the output is in Devanagari?

gasyoun commented 9 years ago

@zaaf2 Is there no way to make the distinction in the digital edition - some coding might help, but no single click solution I see. Because I'm afraid MW was not guided by a bulletproof logic behind it, it's based on etymology, I guess. Does the digital edition output make no distinction between ṉ and ṃ in the original? - seems not, and that's sad indeed.

7. In vyoma 2 [p= 1041,3] [L=210423] What does the * mean in daśā*rha? Why not just Daśārha with some additional markup, no note that the ā is one of 4 MW sandhi type marked. (H1) vyoman 2 [p= 1041,2] [L=210350] m. (for 1. » [p= 1029,1] ; accord. to Uṇ. iv, 150 fr. √ vye accord. to others fr. vi- √av or √ ve) heaven, sky, atmosphere, air m. -> n. Wilson: n. sky Bopp: n. coelum PWG: n. Himmel PWK: n. Himmel MW 1872: n. sky Apte: n. sky Macdonell: n. sky

vyoma

funderburkjim commented 9 years ago

Regarding 'daRqAjinika -> dARqAjinika ' Agree this is a typo in digitization. Appreciated @zaaf2 's explanation of the reason 'dA...' occurs. Using hwnorm1 display convirms that the 'dA...' spelling occurs in several dictionaries.
dIpaKori -> dIpaKorI Agree this is typo.
dfptabAlaki -> dfptabAlAki . Agree with change, and that this is MW print error. Another confirm of Gasyoun's copying theory is
- MW has m. N. of a man with the patr. gArgya ṠBr.
- PWG m. N. pr. eines Mannes mit dem patron. Gârgja Çat. Br. 14, 5, 1, 1. So MW text is essentially a translation to English of PWG.
deuliya -> deüliya This does not have a good solution with the current coding of MW. Here is the underlying record of MW in the mysql database:

<H1><h><hc3>110</hc3><key1>deuliya</key1><hc1>1</hc1><key2>deuliya</key2></h>
<body> <lex>n.</lex> <p><as0 type="ns">Pra1krit</as0><as1>Prakrit</as1>
~for~<s>devakulya</s>?</p> <c>N._of_a_<as0>Gra1ma</as0><as1><s>grAma</s></as1></c> <ls>Kshiti7s3.</ls> 
</body>
<tail><mul/> <MW>061671</MW> <pc>492,2</pc> <L>95498</L></tail>
</H1>

key2 is coded with SLP1. There is no representation of umlaut in SLP1
I added 'deüliya' to the text of the record. This is a partial solution.
Then there is the question of the literary source reference, which as you see in the record is spelled Kshiti7s3. Now as I recall, that '7' in 'i7' was generally used in MW by Thomas to indicate, in Sanskrit words, that the print showed a circumflex over the vowel: î . This situation occurs notably in literary source abbreviations, as here, in the <ls> tag.
MW used this notation in his IAST to indicate two things:
- the vowel is a long vowel, so in terms of Sanskrit IAST spelling, this would normally be shown with the macron ī.
- The circumflex also indicates that this long vowel is not just any long vowel, but it is a long vowel resulting from vowel sandhi combining (long-or-short-vowel x)+(long-or-short-vowel same x)
However, elsewhere Thomas uses x7 to indicate that x has the umlaut diacritic, as in German words.
So, in general, there is a question as to how, in a display, an 'x7' (coded in Thomas Anglicized Sanskrit) should be displayed.
The manner of display, in the current Cologne displays, of 'x7' is governed by details located in the as_roman.xml transcoder file which the particular display uses. Before this discussion, the as_roman.xml file used by MW for displaying the <ls> tag transcodes 'x7' to the unicode for 'x-umlaut'.
- As an experiment, I changed the as_roman.xml to make 'i7' = i-circumflex. This now changes the display of Kshiti7s3 to have a circumflex over the 'i'. This change just affects the MW displays.
- I hope this change doesn't break anything elsewhere in MW displays. Probably not.
regarding grāma -> Grāma (?) in the same headword. Note the record coding above:

<as0>Gra1ma</as0><as1><s>grAma</s></as1>

For such Sanskrit words, displayed in IAST in print, Thomas coded them in AS (Anglicized Sanskrit, such as Gra1ma, with the capitalization preserved. At some point in the process of 'improving' the markup, I added to these original codings an 'slp1' translation, such as grAma.
The displays for MW are written to render the 'slp1' translation into the user's choice of output, When that choice is IAST, the display becomes grāma, with loss of capitalization.
It would be possible to revise the display program to use as_roman.xml based upon the <as0> contents when displaying in Roman Unicode. I am not eager to undertake this program revision, but if someone wants to revise the code, I would likely be glad to install it at Cologne.

funderburkjim commented 9 years ago

devadutI -> devadUtI OCR error Agree
dvadaSAra -> dvAdaSAra Agree. Key2 also changed to show accent: dvA/daSAra

funderburkjim commented 9 years ago

re vyoma 2 [p= 1041,3] [L=210423] What does the * mean in daśā*rha ?

This is closely related to the discussion if 'i7', and also the discussion of 'Gra1ma'. Here is the database coding in question.

<as0>Das3a7rha</as0><as1><s>daSA<srs/>rha</s></as1>

In <as0>, we see the AS coding 'a7' of the textual a-circumflex
In <as1>, this 'a7' has been coded as <srs/> (srs = simple replacement sandhi, or some such acronym)
In the display, the content of the <as1> element is used. And, the current display renders <srs/> as an asterisk, whatever the output choice of the user.

funderburkjim commented 9 years ago

re dyOSaMsita -> dyOsaMSita Agree with the change.

The print agrees with dyOSaMsita, so its not an OCR error.
MW : (dyO-) mfn. impelled or incited by heaven AV. x, 3, 25
dyOSaMsita only occurs in MW
dyOsaMSita occurs in PW, PWG
PWG (of dyOsaMSita): (द्यौ = द्यो + सं°) adj. vom Himmel getrieben Av. 10, 5, 25.
Google translate of pwg: driven from the sky
@zaaf2 reasoning summary:
- saMSita from sam+So, to urge, excite, speed. make ready, prepare RV. AV.
- Samsita from Sam : to praise
- saMSita is closer to 'driven' or 'incited or impelled'
- Thus, saMSita must be correct
Comparison to PWG suggests MW copied from PWG , with this error.
minor question re citation detail: mw AV 10,3,25 and PW 10,5,25 - Did MW copy this wrong also?
Does anyone know how to consult a version of Atharva Veda, to see which was really used in the cited verse? This could provide basis for answering is it 3 or 5.
Regarding, by sandhi, it should be 'dyO-zaMSita' - Given the PWG spelling, my suspicion is there is some special sandhi reasoning that supports 'dyO-saMSita'. Finding AV reference would also provide evidence.
This probably deserves an entry in corrections_factual.

funderburkjim commented 9 years ago

Regarding ṉ and ṃ : In the Cologne digitization, I think this distinction is lost - the two are treated the same, as anusvAra.

I'm not aware of this distinction in Devanagari. Was this distinction introduced by European scholars?

funderburkjim commented 9 years ago

I'll install the above corrections tomorrow.

gasyoun commented 9 years ago

@funderburkjim what about deüliya and similar cases, distinction lost as well? This is restorable, I guess and I would do it, if you agree to implement.

zaaf2 commented 9 years ago

@funderburkjim Was this distinction introduced by European scholars?

The signs ṉ and ṃ are used by MW to mark the phonetic distinction between what he calls a True Anusvára and a Substitute Anusvára (a distinction not made in Devanagari). The first is a nasalized vowel, with no accompanying consonantal closure; the second represents, by mere substitution, the five Sanskrit nasal consonants. V. MW Grammar (6 a, b):

Whitney also makes this distinction (Sanskrit Grammar 73.c):

zaaf2 commented 9 years ago

@funderburkjim Does anyone know how to consult a version of Atharva Veda, to see which was really used in the cited verse?

The word is not found in 10.3.25.

In 10.5.25 there is पृथिवीसंशित, which MW has as: "mfn. impelled by the earth AV. [L=128517]", but which in the consulted edition is translated as "praised on the earth".

I find द्यौसंशित at AV 10.5.27, which is there translated as “praised in the heavenly region”:

(https://archive.org/stream/ATHARVAVEDAVOL1OF2/ATHARVA-VEDA-VOL-1-OF-2#page/n773/mode/2up)

In fact, the word संशित is repeated in all the verses from 10.5.25 to 10.5.35 and in the consulted edition has been consistently translated as “praised”, the sense best adapted to all instances (so it seems to me): “praised on earth” (10.5.25), “praised on the atmospheric region” (10.5.26), “praised in the heavenly region” (10.5.27), “praised in the regions” (10.5.28), “praised in your desirable enterprise” (10.5.29), “praised in the attainment of Rigvedic Knowledge” (10.5.30), “praised in the performance of Yajna” (10.5.31), “praised in the advancement of medical affairs” (10.5.32), “praised in the waters” (10.5.33), “praised in agriculture” (10.5.34), “praised in vitality” (10.5.35). In all these instances it would hardly be possible to translate संशितः as “impeled by”.

In this case, according to MW (L=210855), the word should have been written as शंसित. I conclude that MW’s द्यौशंसित (from √शंस्) is the correct form of the word as it is used in the Atharva Veda, but that the meaning “impelled or incited by heaven” (translated from PWG) is incorrect.

zaaf2 commented 9 years ago

@funderburkjim Regarding, by sandhi, it should be 'dyO-zaMSita' - Given the PWG spelling, my suspicion is there is some special sandhi reasoning that supports 'dyO-saMSita'.

I was wrong. The rule I mentioned about the change of स् to ष् applies to internal Sandhi. The formation of compound words follows the general rules for external combination (v. Whitney 1249). But in the Vedic language the change of स् to ष् occurs frequently even in compounds (MacDonell, Vedic Grammar 67 a, b):

gasyoun commented 9 years ago

confirm of Gasyoun's copying theory - I did not invent it, it's Zgusta's theory (http://yadi.sk/d/h8ALxcCb8sY9w @zaaf2 has not seen the file yet, so might be interesting to him). As of essentially a translation to English of PWG - in most cases that is exactly what you stated. As of There is no representation of umlaut in SLP1 - let's add. Can we ask Peter if he can approve a solution? I need these Umlaut's back for my Dictionary, so I guess it's a regex question. Partial solutions for a few records will not do. This is an easy fix and all I ask is your approval. Oh, so the Gra1ma is there. Your improvement seems suspicious to me :) As per "When that choice is IAST, the display becomes grāma, with loss of capitalization." got it. But IAST is the default mode in printed MW and it's with Capital, so does not make much sense for me. simple replacement sandhi - never heard before, good to know. current display renders <srs/> as an asterisk - oh, so maybe add a popup to the asterisks to note how to understand them?

funderburkjim commented 9 years ago

Regarding dyOSamsita -> dyOsaMSita. I am leaving the correction to dyOsaMSita - @zaaf2 's finding of the word in AV clinches the deal for me as to spelling.

For the same reason, I'll also change 10-3-25 to 10-5-25 in MW.

The choice of interpretation ('praised in heaven' or 'impelled by heaven' ) seems like a separate question, and one which we don't need to answer to justify the MW correction. I wonder how Indian scholars treat what seems to be the confusion between the two interpretations. This question might be the tip of a very big iceberg regarding translation and interpretation of Indian sacred literature.

funderburkjim commented 9 years ago

Regarding reference to AV: One thing both Thomas and Peter have mentioned is the desire to have links from the literary source references of MW, PWG, and other lexicons to digital editions of the references. This example shows some of the values that such links could provide.

But this facility is still beyond current abilities, despite the greater availability of digitized texts now than 10-15 years ago.

However, it might be possible to resolve the references for, say, AV. This would be a good research project for someone to undertake.

funderburkjim commented 9 years ago

Regarding the 10-3-25 error in MW. Since literary sources are identifiable in both MW and PWG, it would be possible to write a program to do at least a partial comparison. Likely other errors in MW would result. This would also be a good, probably relatively small, research project.

zaaf2 commented 9 years ago

dyOSaMsita -> dyOsaMSita

Perhaps the most prudent solution would be to leave it as it is. Reasons:

The form द्यौशंसित is not incorrect;
it is adapted to the context;
it could hardly have been the result of a typographical error, since it involves the change and transposition of three signs, including ṉ, which is not used before स्);
the common confusion between संशित and शंसित has been pointed by MW s.v;
lastly, one cannot exclude the possibility that MW or one of his collaborators found this form in another (perhaps more correct) edition of the AV.

funderburkjim commented 9 years ago

@zaaf2 I'm leaving the correction in place. It has been mentioned in corrections_factual, which in turn mentions this issue thread.

With present knowledge, the saMSita spelling seems most useful, since it leads to both PWG and at least one version of AV. At least that's the way it looks to me now.

funderburkjim commented 9 years ago

Corrections now installed.

@gasyoun Glad you revisited Issue #45. I'll leave it to you to close that issue, or not.

The only item left among the many mentioned in this issue is the umlaut under deuliya headword.

As you can see from current display of MW for deuliya, the umlaut version shows within the entry.

I can make similar changes to other sanskrit-umlaut cases in MW, if you find them.

funderburkjim commented 9 years ago

I think this issue can be closed, but will leave that to @gasyoun , since he opened.

zaaf2 commented 9 years ago

@funderburkjim dyOSaMsita -> dyOsaMSita

I think this proves you are right. There seems to be no other original source with the reading द्यौशंसित

(...)

(from: Atharva-veda Saṁhitā by William Dwight Whitney, Charles Rockwell Lanman, https://archive.org/stream/atharvavedasahi05lanmgoog#page/n126/mode/2up)

zaaf2 commented 9 years ago

A suggestion: the digital display should point to a factual error detected in the printed edition, with a link to the reasons for the correction. In this way, a comparison with the scanned page would not force the user to go through the same process to discover which is right and which is wrong, and he would be alerted to interesting corrections such as this.

gasyoun commented 9 years ago

@zaaf2 This is why we make screenshots here - we add them here, not to open the same page again. What you want is like http://www.kolchose.org/simon/ajaximagemapcreator/ or http://stackoverflow.com/questions/18560097/how-to-make-a-section-of-an-image-a-clickable-link and would be a good idea in 2018-2022 - after the basic headword proofreading is over. If you'll help with that, I'll see how to code it.

zaaf2 commented 9 years ago

@gasyoun What I mean may be best explained by an example.

After the correction dyOSaMsita -> dyOsaMSita, we now have:

(H3) dyO-saMSita [p= 500] : (dyO-) mfn. impelled or incited by heaven AV. x, 5, 25. [L=97387]

I propose something like this:

(H3) dyO-saMSita {dyO-Samsita in the printed edition} [p= 500] : (dyO-) mfn. impelled or incited by heaven AV. x, {5}, 25. [L=97387]

The remarks {...} being at the same time clickable links which would show to the user a text with a summary of the reasons for the correction adopted. I don’t think this would be difficult, considering that this information is already available at corrections_factual

I would also suggest that a search for the old reading dyOSamsita would automatically lead to the corrected article under dyOsaMSita, instead of showing no result, as now.

funderburkjim commented 9 years ago

@zaaf2 Your suggestions regarding display enhancements are good ones.

Suggest you make a separate issue, tagged as 'enhancement', in which you essentially copy the comments you made above. Then, this current issue #127 can be closed, as it deals with many other things. And the new issue can remain open as a reminder
In terms of implementing it, there would be many steps. If we establish a sanskrit-lexicon development server, maybe someone (not necessarily me!) can work to make this a reality. If you have the inclination, you could learn programming and do it yourself!
Of similar interest is @gasyoun 's suggestion regarding an alternate to the output of MW, wherein Gra1ma would be displayed as capitalized IAST.

Also really appreciate the cross-referencing to other sources that you are coming up with, such as Whitney's Atharva veda.

gasyoun commented 9 years ago

The current issue can't be closed, as there at least 332 issues to be covered. So not yet, Jim. There are two new Russian coders, @masted and @juhnowski whom I wanted to introduce to you. 2nd task will be finishing https://github.com/sanskrit-lexicon/Cologne/issues/45, after - who knows.

funderburkjim commented 9 years ago

@gasyoun Think other chunks of the 332 should be posted in additional issues ("ovsO` Corrections in MW , Part 2" etc), just to make the size of issues manageable.

Russian coders have the reputation of being highly skilled, so it will be good if there is a way for them to help with the sanskrit-lexicon project.

gasyoun commented 9 years ago

"ovsO` Corrections in MW , Part 2" - so be it, in that case it's closed. These coders are not only skilled, but are willing to help. My task is to guide them where most help is wanted. For now - whatever may help the Reverse Dictionary comes first.

funderburkjim commented 9 years ago

Corrections installed.

sanskrit-lexicon / CORRECTIONS

`o` vs `O` Corrections in MW #127