sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899
Other
7 stars 5 forks source link

Miscellaneous corrections #137

Closed Andhrabharati closed 2 years ago

Andhrabharati commented 2 years ago

quote marks:

apostrophe:

multiplication mark:

prosody marking:

miscellaneous:

Andhrabharati commented 2 years ago

number followed by letter:

need for uniformity in punctuation marking:

Andhrabharati commented 2 years ago

missing space before '=':

(62609): some= (95618): <s>-liliśire</s>)= (100550): <ls>AV.</ls>= ‘diarrhoea’. (195423): Kollam 336= (511483): (also)= <s>bhū</s> (686740): <ab>mfn.</ab>= <s>°vat</s> (769991): <s>sana</s>=

Andhrabharati commented 2 years ago

Mis-matched pairing of [...] :

(38354): , <ls>Pāṇ.</ls>] [<ls>Pāṇ.</ls>] (85683): [<ls>L.</ls> [<ls>L.</ls>] (85686): [<ls>L.</ls> [<ls>L.</ls>] (85689): [<ls>L.</ls> [<ls>L.</ls>] (169602): <s>dhuni</s>, j <s>dhuni</s>]

Andhrabharati commented 2 years ago

Mis-matched pairing of (...) :

(15609): <s>-devatā</s>) <s>-devatā</s> (15612): <s>-devatā</s>) <s>-devatā</s> (29427): ([<ls>MaitrS.</ls>; <ls>VS.</ls>] [<ls>MaitrS.</ls>; <ls>VS.</ls>] (29430): ([<ls>MaitrS.</ls>; <ls>VS.</ls>] [<ls>MaitrS.</ls>; <ls>VS.</ls>] (40536): <s>-pūrvaka</s>) <s>-pūrvaka</s> (40545): <s>-pūrvaka</s>) <s>-pūrvaka</s> (41022): [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.]) ([<ls>ŚBr.</ls>] [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.] [<ls>ŚBr.</ls>] (41025): [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.]) ([<ls>ŚBr.</ls>] [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.] [<ls>ŚBr.</ls>] (95025): <ab>B.</ab>) (<ab>B.</ab>) (100964): (<s>ās</s> ¦ ¦ ; this is a word ending, mostly being 'removed' throughout (114152): <ls>Ragh.</ls>) <ls>Ragh.</ls> ; print correction (131776): (<ab>B.</ab> (<ab>B.</ab>) (275105): (? for <s>dhaṅka-m</s> (? for <s>dhaṅka-m°</s>) (290174): a) kind of drama a kind of drama (312310): (<ls n="MBh.">ii, 983</ls>-<ls n="MBh.">1203</ls> (<ls n="MBh.">ii, 983</ls>-<ls n="MBh.">1203</ls>) (378395): or <s>°lā<srs/>bda</s> (or <s>°lā<srs/>bda</s> (378395): (which begins on the 20th October, <ab>A.D.</ab> 879. (which begins on the 20th October, <ab>A.D.</ab> 879.) (378398): or <s>°lā<srs/>bda</s> (or <s>°lā<srs/>bda</s> (378398): (which begins on the 20th October, <ab>A.D.</ab> 879. (which begins on the 20th October, <ab>A.D.</ab> 879.) (378401): or <s>°lā<srs/>bda</s> (or <s>°lā<srs/>bda</s> (378401): (which begins on the 20th October, <ab>A.D.</ab> 879. (which begins on the 20th October, <ab>A.D.</ab> 879.) (436419): <ab>wk.</ab>) <ab>wk.</ab> (467381): <ab n="praise">pr°</ab>’ <ab n="praise">pr°</ab>’) (544670): (<ls>HPariś.</ls> (<ls>HPariś.</ls>) (544673): <lex>n.</lex>) <lex>n.</lex> (690877): (<ab>Sch.</ab> (<ab>Sch.</ab>) (714228): <ab n="Germany">G°</ab>) <ab n="Germany">G°</ab> (821440): <ls>RV. x, 133</ls> <ls>RV. x, 133</ls>)

Andhrabharati commented 2 years ago

Issues crept into temp_mw_3_iast, not in the prev. temp_mw_6_iast

  1. space before ';'

(204651): ¦ inserted, interpolated, <ls>R. ii, <ab>ch.</ab> 96</ls> <ab>Sch.</ab>; <ls>Naiṣ. xxii, 48</ls> <ab>Sch.</ab><info lex="inh"/> has become ¦ inserted, interpolated, ; -------------------------------------------------------

all the missed matter after the comma is to be filled up!

(516867): <lex>f.</lex> (A.) has become <lex>f.</lex> (<ls>A.</ls> ;[Apte dictionary])

;[Apte dictionary] to be removed which was my comment

  1. space before ')'

(144769): <ab>Gr.</ab> 969) has become <ls>Gr. 969</ls> )

space to be deleted before the closing brace.

Andhrabharati commented 2 years ago

Root symbol (√) and <s> tag:

There are 4625 √ <s> instances and 360 <s>√ instances.

Shouldn't all be with same sequence-- √ either preceding (outside) or following (inside) the <s> tag?

As there are 2257 cases of type √ <hom>1.</hom> <s>, we can conclude that it should always precede.

But then, there are 2 cases of </hom> √ to consider.

Andhrabharati commented 2 years ago

Capital letter following a small letter in a tagged entry, where it shouldn't be so:

(168816): <lex>f (A)n.</lex> <lex>f (<s>ā</s>)n.</lex> (265210): <i>jallālu 'ddIn</i> <i>jallālu 'ddīn</i> (269458): <etym>gIvēnu</etym> <etym>gīvēnu</etym> (300555): <s1>YamaYamī</s1> <s1>Yama-Yamī</s1> (303738): <s1>ŚrI</s1> <s1>Śrī</s1> (328281): <s1>LakṣmI</s1> <s1>Lakṣmī</s1> (328284): <s1>LakṣmI</s1> <s1>Lakṣmī</s1> (476884): <etym>fSu</etym> <etym>fshu</etym> (585671): <etym>rathaestA</etym> <etym>rathaestā</etym> (585677): <etym>rathaestA</etym> <etym>rathaestā</etym> (628613): <etym>virSús</etym> <etym>virshùs</etym>

[Note. I had deleted all the slp1 strings in my file, for convenience sake.]

Andhrabharati commented 2 years ago

Number before a <s> tag, either indicating a missing <hom> tag or a typo:

[0-9] <s> (60681): 1 <s>ali</s> <hom>1.</hom> <s>ali</s> (227757): 2 <s>gir</s> <hom>2.</hom> <s>gir</s> (227757): 2 <s>gīrṇá</s> <hom>2.</hom> <s>gīrṇá</s> (249605): 1. 2. 3 <s>cit</s> <hom>1.</hom> <hom>2.</hom> >hom>3.</hom> <s>cit</s> (352745): 1. and 2 <s>navya</s> <hom>1.</hom> and <hom>2.</hom> <s>navya</s> (353175): 2 <s>-áka</s> <hom>2.</hom> <s>-áka</s> (441714): 1 <s>mi</s> <hom>1.</hom> <s>mi</s> (469506): 1 <s>prā<srs/>ṅ-nyāya</s> <hom>1.</hom> <s>prā<srs/>ṅ-nyāya</s> -------------------

[0-9], <s> (65237): 4, <s>liyat</s> <s>-līyat</s> (205493): 2, <s>kṣúdh</s> <hom>2.</hom> <s>kṣúdh</s> (357525): 1, <s>náva</s> <hom>1.</hom> <s>náva</s> (415453): 2, <s>pat</s> 2, <ls>Pat.</ls> (419111): 2, <s>as</s> <hom>2.</hom> <s>as</s> (425563): 1. and 2, <s>puri</s> <hom>1.</hom> and <hom>2.</hom> <s>puri</s> (464659): 1, <s>si</s> <hom>1.</hom> <s>si</s> (475781): 5, <s>i</s> <hom>5.</hom> <s>i</s> (507691): 1, <s>bhī</s> <hom>1.</hom> <s>bhī</s> (555959): 1, <s>mura</s> <hom>1.</hom> <s>mura</s> (665267): 1, <s>ru</s> <hom>1.</hom> <s>ru</s> (708977): 2, <s>śad</s> <hom>2.</hom> <s>śad</s> (752772): 1. 2, <s>vṛ</s> <hom>1.</hom> <hom>2.</hom> <s>vṛ</s> (753242): 2, <s>saṃ-vedya</s> <hom>2.</hom> <s>saṃ-vedya</s> (791067): 7, <s>sa</s> <hom>7.</hom> <s>sa</s> (801680): 7, <s>sa</s> <hom>7.</hom> <s>sa</s> (803606): 1, <s>sam-udra</s> <hom>1.</hom> <s>sam-udra</s> -------------------

[0-9]\. <s> (7485): 1. <s>ajá</s> <hom>1.</hom> <s>ajá</s> (7485): 1. <s>ajana</s> <hom>1.</hom> <s>ajana</s> (12602): 1. <s>kṛ</s> <hom>1.</hom> <s>kṛ</s> (13338): 3. <s>á-diti</s> ``<hom>3.</hom> <s>á-diti</s>

... LIST CONTINUES (~600 lines to check manually) -------------------

(35207): 1: <s>kṛ</s> <hom>1.</hom> <s>kṛ</s>

funderburkjim commented 2 years ago

correction/change work done.

About 13000+ lines were altered.

The work was done in the issue137 directory.

I aimed to include all the items mentioned by Andhrabharati. In addition, several additional changes were made with the objective of bringing certain details of the digitization into better conformity with the printed text. No doubt there are other similar changes that will be made as such differences are noticed; let these be discussed in future issues.

punctuation at end of quoted text.

This is one instance where I think there is a good reason for the digitization to vary from the printed text. The printed text invariably puts punctuation (comma, period, semicolon) BEFORE (inside) the closing quote of quoted text. But I have changed to uniformly put the punctuation AFTER (outside) the closing quote: For instance:

OLD  (agrees with print):
<s>aMSu—Dara</s> ¦ <lex>m.</lex> ‘bearer of rays,’ the sun, <ls>L.</ls>
NEW
<s>aMSu—Dara</s> ¦ <lex>m.</lex> ‘bearer of rays’, the sun, <ls>L.</ls>

The reason for the change is that the ending comma, etc. is not part of the quote, but rather separates the quote from other semantic chunks. Note: in case of period, there are a very small number of cases where an ending period IS part of the quoted text, and has thus been left inside, for examples:

[The title <s>AcArya</s> affixed to names of learned men is rather like our ‘Dr.’; <ab>e.g.</ab> <s>rAGavA<srs/>cArya</s>, &c.]
[<ab>fr.</ab> √ <hom>1.</hom> <s>kf</s>, ‘= <s>kurvARa</s>, <s>kartf</s>, &c.’, <ls>Sāy.</ls>]
funderburkjim commented 2 years ago

iast version

This is consistent with the slp1 commit of csl-orig/v02/mw.txt.

temp_mw_issue137_iast.zip

funderburkjim commented 2 years ago

next

Before closing this issue, I'll wait a couple of days to deal with errors or omissions in the way I handled the mw cleanup items of this issue.

My intention then is to take a break from mw changes, and return attention to the ongoing ls-cleanup of PW and PWG.

gasyoun commented 2 years ago

(- - u u - -) to be changed as (- - ˘ ˘ - -)

@Andhrabharati sure?

gasyoun commented 2 years ago

The printed text invariably puts punctuation (comma, period, semicolon) BEFORE (inside) the closing quote of quoted text. But I have changed to uniformly put the punctuation AFTER (outside) the closing quote

@funderburkjim can we document it in a .txt readme, not to forget where there is a such CHANGE by intention?

funderburkjim commented 2 years ago

error correction

Discovered two errors, and corrected. See 'correct two errors. modify change_5.txt' section of issue137 readme.txt for details.

Also revised iast version: temp_mw_issue137_iast_rev.zip

funderburkjim commented 2 years ago

Can we document the intentional change?

A note was made in mw_printchange.txt file of csl-corrections repository: https://github.com/sanskrit-lexicon/csl-corrections/commit/b7ccd24d1988e8cda9105adcd18eda3d1c9ba1b0

funderburkjim commented 2 years ago

(- - u u - -) to be changed as (- - ˘ ˘ - -)

This occurs under <L>82334<pc>435,3<k1>tanumaDyA

This note from readme.txt of issue137:

NOTE:
1. (- - u u - -) to be changed as (- - ˘ ˘ - -)
  Instead change to (¯ ¯ ˘ ˘ ¯ ¯), as used 48 times elsewhere for meter
 i.e., I used the unicode macron (\u00af) for long.
Andhrabharati commented 2 years ago

@funderburkjim

Found some interesting points reg. AND/OR grouping elements!

There are 6 single <L> elements in AND groups (36310, 37336, 45103, 59037, 72383, 80300)

and a whopping 100+ single <L> elements in OR groups! (5295.1, 5963, 6230, 9218, 13040, 13046, 13293, 16421, 16441, 19425, 21437, 21529, 29168, 29633, 29831, 46491, 46738, 49740, 52477, 53475, 57080, 58684.12, 62547, 71080, 91798, 95320, 96358, 97425, 98426, 99465, 99624, 110675, 115003, 115399, 116989, 120532, 129300, 135725, 139457, 141203, 144737, 145504, 148500, 148573, 154262, 157214, 158837, 159177, 166051, 167026, 169540, 180428, 183881, 186088, 186226, 186289, 186645, 188644, 188650, 188663, 191027, 192063, 193598, 194958, 195242, 195996, 196890, 200007, 203319, 203480, 205118, 205142, 205180, 205211, 206064, 206364, 206466, 208147, 210505, 210902, 213989, 216907, 219834, 220857, 223679, 231039, 231079, 237161, 239670, 239811, 239992, 245058, 246345, 247829, 247858, 247867, 248288, 250081, 250879, 252557, 252708, 256290, 259869, 260454, 261644, 262061)

Noticed that these are mostly with accent differences or hyphenation differences.

Would you like to correct this point, as you feel appropriate?

Andhrabharati commented 2 years ago

Also there are 6 <L> entries whose body portion is ending as 'or' (1962, 9981, 96042, 156088, 169950, 171580) and one entry with body-ending as 'of' (which is a typo for 'or') (4074)

And there is one entry with body-ending as 'and' (95389)

These should be combined with the following entries appropriately and then to be made as "proper" grouped entries.

funderburkjim commented 2 years ago

single L groups

It seems reasonable to retain such markup, as it identifies the headwords which have more than one accent variant.

Do you have a better way to do this markup?

Andhrabharati commented 2 years ago

There are 8 <L> entries ending with <ab>w.r.</ab> for, which need to be combined with the next entries properly. (92603, 95205, 98434, 104490, 107642, 114508, 125521, 131402)

funderburkjim commented 2 years ago

Have handled the additional 16 'L' references mentioned in two previous comments. See 'temp_change_or1' and 'temp_change_or2' in readme.

Handled the w.r. cases with a new info attribute: <info orwr="..."/>.

@Andhrabharati Ok to close this issue?

Andhrabharati commented 2 years ago

@funderburkjim

There are about 2000 more w.r. instances in the text; but it may be alright to close this issue for now (with a final update of iast).

This issue appears to have tackled quite many points at once.

We can come back to MW sometime later, after finishing the long-pending ls-cleaning in the PWG family (PWG, pwk, pwkvn and SCH). [I am thinking of giving out my 'resolutions' for all the 'unidentified' entities in these this time.]

Andhrabharati commented 2 years ago

Speaking of ls-resolutions, you are yet to 'finally' correct the RLM (in the MW) as mentioned recently, as at https://github.com/sanskrit-lexicon/MWS/issues/135#issuecomment-1208133347

Andhrabharati commented 2 years ago

Probably you may also consider changing the remaining [noticed that the count has now come down to 300+ from the earlier 800+] ṉ to ṃ; 3 of which are in the ls strings as napuṉs. and the remaining are in the main text at the s1, ns or ab (expansion) strings.

[The single Zend etym-string aiwyāoṉhana at <L>19258.1 may also be changed as above, as this language is considered as a sister language to Sanskrit.]

Andhrabharati commented 2 years ago

Noticed ~300 â instances, which should've been à within the 100+ <s> strings and in the corresp. meta lines.

Andhrabharati commented 2 years ago

Talking of the caret instances above, got reminded of another issue (#107) that might also be considered in the MW spree now.

Andhrabharati commented 2 years ago

There is one instance (line 295319) where √ is not followed by a space; and one instance (line 365916) where div n="to"/> is not preceded by the <.

image

funderburkjim commented 2 years ago

RLM

tooltip altered. Good find! (headword kAkaciYcika).

funderburkjim commented 2 years ago

change_6, 7, 8

About 5000 lines changed.

These address the points above starting at https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1235047261.

More detailed notes are found in the readme, starting at change_6: Extended Ascii changes in ls.

extended ascii

changes â -> ā, ê -> e, î -> ī, ô -> o, û -> ū, ṉ -> ṃ in 3 places:

Note that no changes were made in the <etym> elements. In particular, aiwyāoṉhana at <L>19258.1 was not changed.

funderburkjim commented 2 years ago

Noticed 300 â instances, which should've been à within the 100+ \ strings and in the corresp. meta lines.

I'm not sure what was intended here --

Here's latest iast version of mw digitization: temp_mw_issue137_iast_rev2.zip

funderburkjim commented 2 years ago

paren-bracket

Recently noticed many (800+) instances of ([X]). I think @Andhrabharati previously also noticed these as needing change. From a small sample examination of print, I concluded these should be changed to [X], See change_8.txt for these changes.

This ends my remarks regarding change_6 through change_8.

Andhrabharati commented 2 years ago

Glad that you are considering my above suggestions, @funderburkjim !

Looking at the mwauth corrections-

image

I think a good revision/re-look/vetting of all the ls-expansions is required sometime sooner. [I was just looking at the ? marked (or unlisted) ls-entries thus far in MW.]

As a glaring example, the Gaṇaratnāv. is not Gaṇaratnamahodadhi, but is Gaṇaratnāvalī!!

It is the "collection of Gaṇas to Pāṇini's gr. based on Gaṇaratnamahodadhi & other gr. & lex. works; composed in 1874 A. D. by Yajñeśvara Bhaṭṭa".

Should this be done now, or after completing the PWG ls-exercise? [Anyway, this should be dealt in another issue, but not here.]

Andhrabharati commented 2 years ago

I think @Andhrabharati previously also noticed these as needing change.

Yes, I had mentioned this earlier.

Andhrabharati commented 2 years ago

Noticed 300 â instances, which should've been à within the 100+ <s> strings and in the corresp. meta lines.

I'm not sure what was intended here --

Here's latest iast version of mw digitization: temp_mw_issue137_iast_rev2.zip

Pl. see under <L>550.2 in mw.txt (as example) metaline <k2> akzitavya^ headline <s> akzitavya^

and the corresp. iast text metaline <k2> akṣitavyâ headline <s> akṣitavyâ

Here is the scan of the portion [now I have a very good scan of MW]

image

Do I make sense now, @funderburkjim ? [There are 108 such places in metalines.]

funderburkjim commented 2 years ago

akzitavya^

OK, now I see your concern.
Using mw.txt (the slp1 version), my count is slightly different: 114 matches for "<k2>.*?\^"

In slp1, the spelling uses the ^ character as an accent. Which accent?
It is svarita: See frontmatter

Next, we have the question of how to represent, in displays, this svarita accent with diacritics. In the printed text, the svarita accent is represented by a backward 'grave' accent, and 'udAtta' by a forward 'acute accent. There is no anudAtta mentioned or used.

The Cologne displays use a representation where svarita is represented by circumflex diacritic, udAtta by acute accent diacritic, and anudAtta by grave accent diacritic.

The iast version of MW which you are referring to also used the same 'Cologne' representation.

Thus, there is nothing that requires changing. Just remember that in IAST displays of MW, a Sanskrit word with circumflex-diacritic will appear in the MW printed text with a 'grave'-like diacritic.

funderburkjim commented 2 years ago

<srs/> and svarita

In the printed text of MW a Sanskrit word often appears with a 'circumflex' diacritic. But this is NOT an accent. It is a special convention (described on the same front matter page mentioned above) for representing vowel-sandhi. See for instance, aMSAMSa image

The representation in the Cologne digitization uses the empty xml tag <srs/> : <s>aMSA<srs/>MSa</s>

Although MW describes 4 types of circumflex (representing short+short, short + long, etc.), the Cologne digitization does not distinguish among these types.

I have encountered a few cases where a vowel was coded with <srs/> but should have been coded with svarita. It seems likely that there are other such errors in the digitization.

funderburkjim commented 2 years ago

good revision/re-look/vetting of all the ls-expansions

Definitely agree. You are the best person to do this. You could edit the file tooltip.txt and give me the resulting edited file for installation. Agree best to make another issue devoted to discussions arising during the review.

Andhrabharati commented 2 years ago

Thus, there is nothing that requires changing.

I would say otherwise-- There definitely is a need to do something here!

One year ago, (April 2021) while I was at MW work (for Cologne), these 100+ places were all properly converted/rendered as à in the metalines and the resp. headlines, in the IAST file you gave, as also the whole lot (127k) of other à throughout the text, as per the print matter. [The file is dated 4th April 2021] https://github.com/sanskrit-lexicon/MWS/issues/104#issuecomment-817359904

To make you see the difference more clearly, I am giving two examples now (comparing MW and PWG/pwk)--

  1. display of the entry asurya in MW

image

vs. PWG & pwk

image

  1. Display of akṣitavya in MW

image

vs. pwkvn

image

And having à at these places makes the MW data tally with the original (sources) PWG/pwk data.

We don't have to reiterate that much of MW content is based on PWG family data, and they should be rendered in a similar fashion. [There should not be any second thought on this.]

Andhrabharati commented 2 years ago

As a glaring example, the Gaṇaratnāv. is not Gaṇaratnamahodadhi, but is Gaṇaratnāvalī!!

It is the "collection of Gaṇas to Pāṇini's gr. based on Gaṇaratnamahodadhi & other gr. & lex. works; composed in 1874 A. D. by Yajñeśvara Bhaṭṭa".

Just for info-- This work got printed after 100 years, in 1986.

Here is the title page and the list of works referred/cited therein-- image

image

[Probably @gasyoun might be interested to make a note of this info.]

Andhrabharati commented 2 years ago

@funderburkjim

Let's close this misc. corrections issue, with three more small corrections-

<div n="to"/><ab>[A-Z] (1100+ instances) as <div n="vp"/><ab>[A-Z] (2000+ instances presently), as all these denote vp type entities.

¦ , (19 instances) as , ¦ (5700+ instances presently)

... [three dots] (13 instances) as … [horiz. ellipsis] (no instance as of now)

[We can handle the remaining misc. corrections in another issue sometime later.]

funderburkjim commented 2 years ago

pwkvn iast accent

Note IAST output for akzitavya^ in pwkvn: image

And in mw: image

Note the accent representation is the same in IAST.

funderburkjim commented 2 years ago

pwkvn Devanagari accent

For pwkvn: image

For MW: image

Note the Devanagari representation for svarita accent DIFFERS in MW and in pwkvn.

why the difference?

We have CHOSEN to make the Devanagari accent representation in PW, PWK, PWG consistent with the printed form of PWG, etc. Thus, the little vertical line over the vowel is used to represent svarita accent in PW, etc. (Similarly, udAtta is the little superscript devanagari 'u' in PW, etc.)

If you are wanting to compare MW with PW in terms of accents, then you should use either the slp1 representation or the IAST representation.

I still say no change is warranted at this time. If (as I suspect) you still disagree, I suggest you open a new issue devoted to this subject.

Andhrabharati commented 2 years ago

I doubt if you would be correcting these under a new issue, when not convinced about the point here itself; so I do not want to go that way.

If you don't like to bring these 100+ cases (â) within MW in line with the rest of 127k+ cases (à), which are all à in the print, you're the final judge as far as cologne data is concerned.

Thus, I leave the matter for now.

funderburkjim commented 2 years ago

the rest of 127k+ cases (à), which are all à in the print,

126895 matches in 118274 lines for "a/" These are the instances (according to Cologne digitization) of the short vowel 'a' with udAtta accent. These are represented in print with an acute accent (e.g., under headword 'a'): image

And, with output=iast in a Cologne display, they appear as a-with-acute-accent, á, not à image

The a with grave accent (à) would be the Cologne iast representation of "a with anudAtta accent" (slp1 a\), - there are none of these in MW.

funderburkjim commented 2 years ago

Although I don't feel comfortable with changing the representation of svarita accent in Cologne mw displays, I'm not sure my view should prevail. I've opened another issue so the question of accent representation (especially in mw and in the PW family of dictionaries) will remain 'open' for some future consideration.

funderburkjim commented 2 years ago

3rd batch of changes

These mainly from above https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1236366820.

Also corrected several 'madA' entries to 'mada' -- some 'sub-entries' of 'mada' were incorrectly interpreted as feminine.

The details are in change_9.txt file of issue137 directory, and also mentioned in the readme at 'change_9' and following.

Here is the latest iast version of mw: temp_mw_issue137_iast_rev3.zip

Many varied improvements now made to the mw digitization and markup. Thanks to @Andhrabharati for his continued 'fresh look' at mw. Now closing the issue.

Andhrabharati commented 2 years ago

the rest of 127k+ cases (à), which are all à in the print,

It is my grosss mistake, using a wrong character at this.

funderburkjim commented 2 years ago

accent revision

temp1_mw_10_iast.zip

This is iast version of mw digitization, with accent revised (so svarita accent = grave accent diacritic). For discussion, refer #140.

Andhrabharati commented 2 years ago

Fantastic; now the CDSL MW Roman text matches with the print.

Thanks a lot for relieving my worry, @funderburkjim !

As MW does not mark the Devanagari accents in the book, I am not that bothered about them in MW display and would leave the matter to the discretion of Jim, whether to match MW with PWG family or not.

However my final comment on the matter is to ask the team (@funderburkjim & @gasyoun) to just check the MW RV citations once with the corresponding linked RV text (courtesy: Marcis) and see if they notice any differences in Devanagari accents, and then compare the PWG RV citations with the RV links thereupon. [Probably, my point would be appreciated then.]