sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

MW72 - hiatus issue in acCa-i #370

Open drdhaval2785 opened 7 years ago

drdhaval2785 commented 7 years ago

Four corrections. Ready to install.

MW72:acCE:acCai:t
MW72:aBicihnya:aBicihnaya:t
MW72:aBisaNkruD:aBisaMkruD:t
MW72:aBisaNkruS:aBisaMkruS:t

acCai is hiatus issue. capture In alternate headwords, this may have larger implications. Whether we want to treat it as hiatus or allow it to merge is totally upto us. I vote for hiatus.

capture

capture

funderburkjim commented 7 years ago

Whether we want to treat it as hiatus or allow it to merge is totally up to us. I vote for hiatus.

With our current alternate headword system for dictionaries in meta-line form, we don't have to choose. Let's leave the current headword form as 'acCE', and put 'acCai' as an alternate headword. (Thus, do not include the above correction, or comment it out).

MW72 not yet converted to meta-line form. I've put a note in the appropriate spot in mw72/pywork on cologne so acCai will be added to mw72_hwextra.txt when MW72's meta-line conversion is done.

SergeA commented 7 years ago

Let's leave the current headword form as 'acCE', and put 'acCai' as an alternate headword.

The spelling अच्छै is wrong. The alternate अच्छइ also is not very good. When Monier writes it as accha-i separately, he means two separate morphemes. But in the case of their joining the sandhi must be applied: अच्छ+इ = अच्छे. Readers are expected to make this sandhi by themselves.

funderburkjim commented 7 years ago

spelling अच्छै is wrong

I should have realized this; thanks for the correction!

I notice that in MW(1899), the prefix ends in long A: acCAi. Should that be our correction also in MW72, rather than acCai ?

Incidentally, PD and STC have acCe, which seems the best choice.

Note in MW there is also 'e' instead of 'Ai'. Similarly in MW72. Should we enforce consistency on MW and MW72 by changing acCAi to acCe for our search key?

gasyoun commented 6 years ago

Should that be our correction also in MW72, rather than acCai ?

Guess not. It's orthography and it changes every 50-100 years.

Should we enforce consistency on MW and MW72 by changing acCAi to acCe for our search key?

I would not.

SergeA commented 6 years ago

I notice that in MW(1899), the prefix ends in long A: acCAi. Should that be our correction also in MW72, rather than acCai ?

No. Though accha-i 72 and acchā-i 99 refer to the same word, these two spellings represent author's opinion, which was changed. In 72 Monier writes "aććha or usually aććhā" both are ok. (NB! in digitalization there is a typo in accha-3 "aććha or usually aććha" - lost macron.) But in 99 he decides that accha only at the end of the word and as the prefix it should be acchā: "accha (so at the end of a pāda) , or usually acchā".

Note in MW there is also 'e' instead of 'Ai'. Similarly in MW72. Should we enforce consistency on MW and MW72 by changing acCAi to acCe for our search key?

From the grammatical point of view accha/ā-i and ā-i are similar. So there is an inconsistency when Monier in one case writes it as e (ā-i) and in another just accha/ā-i without giving the sandhied form. The user probably will search for acche, so it will be good to add it as a search option. I think both accha/ā-i and acche must be searchable.

Usually Monier gives the actual spellings of the sandhied words. But in some cases he lefts parts of the words unsandhied. Perhaps we should add sandhied alternates to such headwords. I wonder how many they are.

BTW is Monier's wise system of 4 types of circumflexes is lost in the digitalization or somehow preserved? I see red asterisks near those letters - what do they mean?

gasyoun commented 6 years ago

I wonder how many they are.

Jim, is there a way we could calculate the rough number?

BTW is Monier's wise system of 4 types of circumflexes is lost in the digitalization or somehow preserved?

Preserved, but not shown in web display, because it's not in that font. I have it presented in my version of Charter, but it's not used online.

funderburkjim commented 6 years ago

Perhaps we should add sandhied alternates to such headwords. I wonder how many they are.

We should be able to add, for example, 'acCe' as an alternate headword for 'acCai'. And similarly for other cases where the 'unsandhied' forms are present.

Currently don't know if there are any others comparable to acCai (where there is ONLY the unsandhied form).

One quick and dirty check of mw72hw0.txt (for pattern a-i) yields:

   7827:0110-c:A-i:25522,25522:1   current key1 = Ai. record = 'see e'  (i.e., MW already handles alternate)
   7828:0110-c:A-inD:25523,25523:1         all of these are similar.
   7829:0110-c:A-inv:25524,25524:1
   7830:0110-c:A-iz:25525,25525:1
   7831:0110-c:A-Ikz:25526,25526:1
   7832:0110-c:A-Ir:25527,25527:1
   7833:0110-c:A-Iz:25528,25528:1

So, these don't require further attention.

Another search of mw72 headwords (mw72hw2.txt) shows 96 headwords ending in E. A quick glance suggests that most (perhaps all) of these are like aBigE (where gE is legitimate root ending in dipthong E).

I'll let you do further searching for suspicious characters. You could do filters on sanhw1.txt .

funderburkjim commented 6 years ago

Monier's wise system of 4 types of circumflexes

AFAIK, this system is present only in mw(1899), not mw72.

In Cologne's digitization of mw (= mw1899), this circumflex system is coded as ONE form (not 4).

image

For instance indrA<srs/>ditya , indrA<srs/>nuja are coded the same, even though there is a distinction in the 'weight' of the two arms of the circumflex in indrAditya (indicating indra + Aditya), whereas there are equal weight of the two arms of the circumflex in indrAnuja (indicating indra + anuja).

One time I worked on undoing the sandhi in such cases (i.e., to derive indra + Aditya). This work is part of this file . Look up indrAditya for instance.

There are some cases where my analysis yielded two possibilities. For instance indrAyuDa -> indra+ayuDa,indra+AyuDa. This is because both 'ayuDa' and 'AyuDa' are headwords in MW. However, by looking at the scan, I see that MW actually intended indra+AyuDa: image There are 5400 cases like this in the analysis2 file.

Refining the interpretation of analysis2 by taking into account the weight of the circumflex legs in the printed text is a doable task, but one that would take quite a bit of time in examining scans.

Also, just to complete the picture, there are currently 890 cases where the analysis could not be completed, because one or more of the component words was not identified as a headword. (Use regex search on analysis2.txt with @.*TODO .)

Based on this work we might be able to reconstruct the original 4 types of circumflex.