Miscellaneous corrections

sanskrit-lexicon / MWS

Monier Monier-Williams, Sir; A Sanskrit-English dictionary. Oxford, 1899

Other

7 stars 5 forks source link

Miscellaneous corrections #137

Closed Andhrabharati closed 2 years ago

Andhrabharati commented 2 years ago

quote marks:

'e<srs/>hi mā yāsīr!' to be changed as ‘e<srs/>hi mā yāsīr!’
the esoteric to be changed as ‘the esoteric’
first father' to be changed as ‘first father’
‘perhaps it is and is not and is not expressible in words' to be changed as ‘perhaps it is and is not and is not expressible in words’
‘a ‘goat’, a derivation in the sense of, goat's flesh’ to be changed as ‘a goat’, a derivation in the sense of ‘goat's flesh’
‘the ‘<s1>Śabara</s1>s’ food’ to be changed as ‘the <s1>Śabara</s1>s' food’

apostrophe:

<s>-gate' hani</s> to be changed as <s>-gate 'hani</s>
60 years ' cycle to be changed as 60 years' cycle

multiplication mark:

the x (U+0078) at 1 instance of [0-9]x [0-9] and 121 instances of [0-9] x [0-9] could be changed as the multiplication mark × (U+00D7).

prosody marking:

(- - u u - -) to be changed as (- - ˘ ˘ - -)

miscellaneous:

1880 instances of space at the line-ending to be deleted.
1 instance of \t (tab) at the line-ending to be deleted.
2 instances of * to be deleted.
1 instance of ** to be deleted.

Andhrabharati commented 2 years ago

number followed by letter:

0f to be changed as of
0law-book to be changed as law-book
3oth to be changed as 30th
4<s1 to be changed as 4 <s1
4tb to be changed as 4th
5<s1 to be changed as 5 <s1
5jewels to be changed as 5 jewels
6o to be changed as 60
11sth to be changed as 11th [2 places]
ccl. 2. to be changed as col. 2. [2 places]
p.802col.2 to be changed as p.802 col.2
p.1118col.1. to be changed as p.1118 col.1.
p.1200col.3. to be changed asp.1200 col.3.

need for uniformity in punctuation marking:

’. 63 instances and .’ 207 instances
’, 13628 instances and ,’ 33 instances
’; 782 instances and ;’ 1 instance
[1-3]\. <ab>sg 892 instances and[1-3] <ab>sg 12 instances
[1-3]\. <ab>du 139 instances and [1-3] <ab>du 1 instance
[1-3]\. <ab>pl 839 instances and [1-3] <ab>pl 23 instances
p\. [0-9]+ 4632 instances and p\.[0-9]+ 468 instances
<ab>p\.</ab> [0-9]+ 102 instances and <ab>p\.</ab>[0-9]+ 1 instance
col\. [0-9]+ 1565 instances and col\.[0-9]+ 2 instances
<ab>col\.</ab> [0-9]+ 61 instances and <ab>col\.</ab>[0-9]+ 604 instances

Andhrabharati commented 2 years ago

missing space before '=':

(62609): some= (95618): <s>-liliśire</s>)= (100550): <ls>AV.</ls>= ‘diarrhoea’. (195423): Kollam 336= (511483): (also)= <s>bhū</s> (686740): <ab>mfn.</ab>= <s>°vat</s> (769991): <s>sana</s>=

Andhrabharati commented 2 years ago

Mis-matched pairing of [...] :

(38354): , <ls>Pāṇ.</ls>] [<ls>Pāṇ.</ls>] (85683): [<ls>L.</ls> [<ls>L.</ls>] (85686): [<ls>L.</ls> [<ls>L.</ls>] (85689): [<ls>L.</ls> [<ls>L.</ls>] (169602): <s>dhuni</s>, j <s>dhuni</s>]

Andhrabharati commented 2 years ago

Mis-matched pairing of (...) :

(15609): <s>-devatā</s>) <s>-devatā</s> (15612): <s>-devatā</s>) <s>-devatā</s> (29427): ([<ls>MaitrS.</ls>; <ls>VS.</ls>] [<ls>MaitrS.</ls>; <ls>VS.</ls>] (29430): ([<ls>MaitrS.</ls>; <ls>VS.</ls>] [<ls>MaitrS.</ls>; <ls>VS.</ls>] (40536): <s>-pūrvaka</s>) <s>-pūrvaka</s> (40545): <s>-pūrvaka</s>) <s>-pūrvaka</s> (41022): [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.]) ([<ls>ŚBr.</ls>] [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.] [<ls>ŚBr.</ls>] (41025): [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.]) ([<ls>ŚBr.</ls>] [<ls>RV. x, 152, 2</ls>; <ls>AV.</ls> &c.] [<ls>ŚBr.</ls>] (95025): <ab>B.</ab>) (<ab>B.</ab>) (100964): (<s>ās</s> ¦ ¦ ; this is a word ending, mostly being 'removed' throughout (114152): <ls>Ragh.</ls>) <ls>Ragh.</ls> ; print correction (131776): (<ab>B.</ab> (<ab>B.</ab>) (275105): (? for <s>dhaṅka-m</s> (? for <s>dhaṅka-m°</s>) (290174): a) kind of drama a kind of drama (312310): (<ls n="MBh.">ii, 983</ls>-<ls n="MBh.">1203</ls> (<ls n="MBh.">ii, 983</ls>-<ls n="MBh.">1203</ls>) (378395): or <s>°lā<srs/>bda</s> (or <s>°lā<srs/>bda</s> (378395): (which begins on the 20th October, <ab>A.D.</ab> 879. (which begins on the 20th October, <ab>A.D.</ab> 879.) (378398): or <s>°lā<srs/>bda</s> (or <s>°lā<srs/>bda</s> (378398): (which begins on the 20th October, <ab>A.D.</ab> 879. (which begins on the 20th October, <ab>A.D.</ab> 879.) (378401): or <s>°lā<srs/>bda</s> (or <s>°lā<srs/>bda</s> (378401): (which begins on the 20th October, <ab>A.D.</ab> 879. (which begins on the 20th October, <ab>A.D.</ab> 879.) (436419): <ab>wk.</ab>) <ab>wk.</ab> (467381): <ab n="praise">pr°</ab>’ <ab n="praise">pr°</ab>’) (544670): (<ls>HPariś.</ls> (<ls>HPariś.</ls>) (544673): <lex>n.</lex>) <lex>n.</lex> (690877): (<ab>Sch.</ab> (<ab>Sch.</ab>) (714228): <ab n="Germany">G°</ab>) <ab n="Germany">G°</ab> (821440): <ls>RV. x, 133</ls> <ls>RV. x, 133</ls>)

Andhrabharati commented 2 years ago

Issues crept into temp_mw_3_iast, not in the prev. temp_mw_6_iast

space before ';'

(204651): ¦ inserted, interpolated, <ls>R. ii, <ab>ch.</ab> 96</ls> <ab>Sch.</ab>; <ls>Naiṣ. xxii, 48</ls> <ab>Sch.</ab><info lex="inh"/> has become ¦ inserted, interpolated, ; -------------------------------------------------------

all the missed matter after the comma is to be filled up!

(516867): <lex>f.</lex> (A.) has become <lex>f.</lex> (<ls>A.</ls> ;[Apte dictionary])

;[Apte dictionary] to be removed which was my comment

space before ')'

(144769): <ab>Gr.</ab> 969) has become <ls>Gr. 969</ls> )

space to be deleted before the closing brace.

Andhrabharati commented 2 years ago

Root symbol (√) and <s> tag:

There are 4625 √ <s> instances and 360 <s>√ instances.

Shouldn't all be with same sequence-- √ either preceding (outside) or following (inside) the <s> tag?

As there are 2257 cases of type √ <hom>1.</hom> <s>, we can conclude that it should always precede.

But then, there are 2 cases of </hom> √ to consider.

Andhrabharati commented 2 years ago

Capital letter following a small letter in a tagged entry, where it shouldn't be so:

(168816): <lex>f (A)n.</lex> <lex>f (<s>ā</s>)n.</lex> (265210): <i>jallālu 'ddIn</i> <i>jallālu 'ddīn</i> (269458): <etym>gIvēnu</etym> <etym>gīvēnu</etym> (300555): <s1>YamaYamī</s1> <s1>Yama-Yamī</s1> (303738): <s1>ŚrI</s1> <s1>Śrī</s1> (328281): <s1>LakṣmI</s1> <s1>Lakṣmī</s1> (328284): <s1>LakṣmI</s1> <s1>Lakṣmī</s1> (476884): <etym>fSu</etym> <etym>fshu</etym> (585671): <etym>rathaestA</etym> <etym>rathaestā</etym> (585677): <etym>rathaestA</etym> <etym>rathaestā</etym> (628613): <etym>virSús</etym> <etym>virshùs</etym>

[Note. I had deleted all the slp1 strings in my file, for convenience sake.]

Andhrabharati commented 2 years ago

Number before a <s> tag, either indicating a missing <hom> tag or a typo:

[0-9] <s> (60681): 1 <s>ali</s> <hom>1.</hom> <s>ali</s> (227757): 2 <s>gir</s> <hom>2.</hom> <s>gir</s> (227757): 2 <s>gīrṇá</s> <hom>2.</hom> <s>gīrṇá</s> (249605): 1. 2. 3 <s>cit</s> <hom>1.</hom> <hom>2.</hom> >hom>3.</hom> <s>cit</s> (352745): 1. and 2 <s>navya</s> <hom>1.</hom> and <hom>2.</hom> <s>navya</s> (353175): 2 <s>-áka</s> <hom>2.</hom> <s>-áka</s> (441714): 1 <s>mi</s> <hom>1.</hom> <s>mi</s> (469506): 1 <s>prā<srs/>ṅ-nyāya</s> <hom>1.</hom> <s>prā<srs/>ṅ-nyāya</s> -------------------

[0-9], <s> (65237): 4, <s>liyat</s> <s>-līyat</s> (205493): 2, <s>kṣúdh</s> <hom>2.</hom> <s>kṣúdh</s> (357525): 1, <s>náva</s> <hom>1.</hom> <s>náva</s> (415453): 2, <s>pat</s> 2, <ls>Pat.</ls> (419111): 2, <s>as</s> <hom>2.</hom> <s>as</s> (425563): 1. and 2, <s>puri</s> <hom>1.</hom> and <hom>2.</hom> <s>puri</s> (464659): 1, <s>si</s> <hom>1.</hom> <s>si</s> (475781): 5, <s>i</s> <hom>5.</hom> <s>i</s> (507691): 1, <s>bhī</s> <hom>1.</hom> <s>bhī</s> (555959): 1, <s>mura</s> <hom>1.</hom> <s>mura</s> (665267): 1, <s>ru</s> <hom>1.</hom> <s>ru</s> (708977): 2, <s>śad</s> <hom>2.</hom> <s>śad</s> (752772): 1. 2, <s>vṛ</s> <hom>1.</hom> <hom>2.</hom> <s>vṛ</s> (753242): 2, <s>saṃ-vedya</s> <hom>2.</hom> <s>saṃ-vedya</s> (791067): 7, <s>sa</s> <hom>7.</hom> <s>sa</s> (801680): 7, <s>sa</s> <hom>7.</hom> <s>sa</s> (803606): 1, <s>sam-udra</s> <hom>1.</hom> <s>sam-udra</s> -------------------

[0-9]\. <s> (7485): 1. <s>ajá</s> <hom>1.</hom> <s>ajá</s> (7485): 1. <s>ajana</s> <hom>1.</hom> <s>ajana</s> (12602): 1. <s>kṛ</s> <hom>1.</hom> <s>kṛ</s> (13338): 3. <s>á-diti</s> ``<hom>3.</hom> <s>á-diti</s>

... LIST CONTINUES (~600 lines to check manually) -------------------

(35207): 1: <s>kṛ</s> <hom>1.</hom> <s>kṛ</s>

funderburkjim commented 2 years ago

correction/change work done.

About 13000+ lines were altered.

The work was done in the issue137 directory.

I aimed to include all the items mentioned by Andhrabharati. In addition, several additional changes were made with the objective of bringing certain details of the digitization into better conformity with the printed text. No doubt there are other similar changes that will be made as such differences are noticed; let these be discussed in future issues.

punctuation at end of quoted text.

This is one instance where I think there is a good reason for the digitization to vary from the printed text. The printed text invariably puts punctuation (comma, period, semicolon) BEFORE (inside) the closing quote of quoted text. But I have changed to uniformly put the punctuation AFTER (outside) the closing quote: For instance:

OLD  (agrees with print):
<s>aMSu—Dara</s> ¦ <lex>m.</lex> ‘bearer of rays,’ the sun, <ls>L.</ls>
NEW
<s>aMSu—Dara</s> ¦ <lex>m.</lex> ‘bearer of rays’, the sun, <ls>L.</ls>

The reason for the change is that the ending comma, etc. is not part of the quote, but rather separates the quote from other semantic chunks. Note: in case of period, there are a very small number of cases where an ending period IS part of the quoted text, and has thus been left inside, for examples:

[The title <s>AcArya</s> affixed to names of learned men is rather like our ‘Dr.’; <ab>e.g.</ab> <s>rAGavA<srs/>cArya</s>, &c.]
[<ab>fr.</ab> √ <hom>1.</hom> <s>kf</s>, ‘= <s>kurvARa</s>, <s>kartf</s>, &c.’, <ls>Sāy.</ls>]

funderburkjim commented 2 years ago

iast version

This is consistent with the slp1 commit of csl-orig/v02/mw.txt.

temp_mw_issue137_iast.zip

funderburkjim commented 2 years ago

Before closing this issue, I'll wait a couple of days to deal with errors or omissions in the way I handled the mw cleanup items of this issue.

My intention then is to take a break from mw changes, and return attention to the ongoing ls-cleanup of PW and PWG.

gasyoun commented 2 years ago

(- - u u - -) to be changed as (- - ˘ ˘ - -)

@Andhrabharati sure?

gasyoun commented 2 years ago

The printed text invariably puts punctuation (comma, period, semicolon) BEFORE (inside) the closing quote of quoted text. But I have changed to uniformly put the punctuation AFTER (outside) the closing quote

@funderburkjim can we document it in a .txt readme, not to forget where there is a such CHANGE by intention?

funderburkjim commented 2 years ago

error correction

Discovered two errors, and corrected. See 'correct two errors. modify change_5.txt' section of issue137 readme.txt for details.

Also revised iast version: temp_mw_issue137_iast_rev.zip

funderburkjim commented 2 years ago

Can we document the intentional change?

A note was made in mw_printchange.txt file of csl-corrections repository: https://github.com/sanskrit-lexicon/csl-corrections/commit/b7ccd24d1988e8cda9105adcd18eda3d1c9ba1b0

funderburkjim commented 2 years ago

(- - u u - -) to be changed as (- - ˘ ˘ - -)

This occurs under <L>82334<pc>435,3<k1>tanumaDyA

This note from readme.txt of issue137:

NOTE:
1. (- - u u - -) to be changed as (- - ˘ ˘ - -)
  Instead change to (¯ ¯ ˘ ˘ ¯ ¯), as used 48 times elsewhere for meter
 i.e., I used the unicode macron (\u00af) for long.

Andhrabharati commented 2 years ago

@funderburkjim

Found some interesting points reg. AND/OR grouping elements!

There are 6 single <L> elements in AND groups (36310, 37336, 45103, 59037, 72383, 80300)

and a whopping 100+ single <L> elements in OR groups! (5295.1, 5963, 6230, 9218, 13040, 13046, 13293, 16421, 16441, 19425, 21437, 21529, 29168, 29633, 29831, 46491, 46738, 49740, 52477, 53475, 57080, 58684.12, 62547, 71080, 91798, 95320, 96358, 97425, 98426, 99465, 99624, 110675, 115003, 115399, 116989, 120532, 129300, 135725, 139457, 141203, 144737, 145504, 148500, 148573, 154262, 157214, 158837, 159177, 166051, 167026, 169540, 180428, 183881, 186088, 186226, 186289, 186645, 188644, 188650, 188663, 191027, 192063, 193598, 194958, 195242, 195996, 196890, 200007, 203319, 203480, 205118, 205142, 205180, 205211, 206064, 206364, 206466, 208147, 210505, 210902, 213989, 216907, 219834, 220857, 223679, 231039, 231079, 237161, 239670, 239811, 239992, 245058, 246345, 247829, 247858, 247867, 248288, 250081, 250879, 252557, 252708, 256290, 259869, 260454, 261644, 262061)

Noticed that these are mostly with accent differences or hyphenation differences.

Would you like to correct this point, as you feel appropriate?

Andhrabharati commented 2 years ago

Also there are 6 <L> entries whose body portion is ending as 'or' (1962, 9981, 96042, 156088, 169950, 171580) and one entry with body-ending as 'of' (which is a typo for 'or') (4074)

And there is one entry with body-ending as 'and' (95389)

These should be combined with the following entries appropriately and then to be made as "proper" grouped entries.

funderburkjim commented 2 years ago

single L groups

It seems reasonable to retain such markup, as it identifies the headwords which have more than one accent variant.

Do you have a better way to do this markup?

Andhrabharati commented 2 years ago

There are 8 <L> entries ending with <ab>w.r.</ab> for, which need to be combined with the next entries properly. (92603, 95205, 98434, 104490, 107642, 114508, 125521, 131402)

funderburkjim commented 2 years ago

Have handled the additional 16 'L' references mentioned in two previous comments. See 'temp_change_or1' and 'temp_change_or2' in readme.

Handled the w.r. cases with a new info attribute: <info orwr="..."/>.

@Andhrabharati Ok to close this issue?

Andhrabharati commented 2 years ago

@funderburkjim

There are about 2000 more w.r. instances in the text; but it may be alright to close this issue for now (with a final update of iast).

This issue appears to have tackled quite many points at once.

We can come back to MW sometime later, after finishing the long-pending ls-cleaning in the PWG family (PWG, pwk, pwkvn and SCH). [I am thinking of giving out my 'resolutions' for all the 'unidentified' entities in these this time.]

Andhrabharati commented 2 years ago

Speaking of ls-resolutions, you are yet to 'finally' correct the RLM (in the MW) as mentioned recently, as at https://github.com/sanskrit-lexicon/MWS/issues/135#issuecomment-1208133347

Andhrabharati commented 2 years ago

Probably you may also consider changing the remaining [noticed that the count has now come down to 300+ from the earlier 800+] ṉ to ṃ; 3 of which are in the ls strings as napuṉs. and the remaining are in the main text at the s1, ns or ab (expansion) strings.

[The single Zend etym-string aiwyāoṉhana at <L>19258.1 may also be changed as above, as this language is considered as a sister language to Sanskrit.]

Andhrabharati commented 2 years ago

Noticed ~300 â instances, which should've been à within the 100+ <s> strings and in the corresp. meta lines.

Andhrabharati commented 2 years ago

Talking of the caret instances above, got reminded of another issue (#107) that might also be considered in the MW spree now.

Andhrabharati commented 2 years ago

There is one instance (line 295319) where √ is not followed by a space; and one instance (line 365916) where div n="to"/> is not preceded by the <.

funderburkjim commented 2 years ago

RLM

tooltip altered. Good find! (headword kAkaciYcika).

funderburkjim commented 2 years ago

change_6, 7, 8

About 5000 lines changed.

These address the points above starting at https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1235047261.

More detailed notes are found in the readme, starting at change_6: Extended Ascii changes in ls.

extended ascii

changes â -> ā, ê -> e, î -> ī, ô -> o, û -> ū, ṉ -> ṃ in 3 places:

ls elements (e.g. <ls>Divyâd.</ls> -> <ls>Divyād.</ls>
tooltips for mw ls: see csl-pywork revision 8af48bf (link above)
- tooltip abbreviations need to be consistent with <ls> in mw.txt
s1 elements For example <s1 slp1="mAMsarohiRI">Māṉsarohiṇī</s1> -> <s1 slp1="mAMsarohiRI">Māṃsarohiṇī</s1>

Note that no changes were made in the <etym> elements. In particular, aiwyāoṉhana at <L>19258.1 was not changed.

funderburkjim commented 2 years ago

Noticed 300 â instances, which should've been à within the 100+ \ ~~strings and in the corresp. meta lines.~~

I'm not sure what was intended here --

Here's latest iast version of mw digitization: temp_mw_issue137_iast_rev2.zip

funderburkjim commented 2 years ago

paren-bracket

Recently noticed many (800+) instances of ([X]). I think @Andhrabharati previously also noticed these as needing change. From a small sample examination of print, I concluded these should be changed to [X], See change_8.txt for these changes.

This ends my remarks regarding change_6 through change_8.

Andhrabharati commented 2 years ago

Glad that you are considering my above suggestions, @funderburkjim !

Looking at the mwauth corrections-

I think a good revision/re-look/vetting of all the ls-expansions is required sometime sooner. [I was just looking at the ? marked (or unlisted) ls-entries thus far in MW.]

As a glaring example, the Gaṇaratnāv. is not Gaṇaratnamahodadhi, but is Gaṇaratnāvalī!!

It is the "collection of Gaṇas to Pāṇini's gr. based on Gaṇaratnamahodadhi & other gr. & lex. works; composed in 1874 A. D. by Yajñeśvara Bhaṭṭa".

Should this be done now, or after completing the PWG ls-exercise? [Anyway, this should be dealt in another issue, but not here.]

Andhrabharati commented 2 years ago

I think @Andhrabharati previously also noticed these as needing change.

Yes, I had mentioned this earlier.

Andhrabharati commented 2 years ago

Noticed 300 â instances, which should've been à within the 100+ <s> strings and in the corresp. meta lines.

I'm not sure what was intended here --

Here's latest iast version of mw digitization: temp_mw_issue137_iast_rev2.zip

Pl. see under <L>550.2 in mw.txt (as example) metaline <k2> akzitavya^ headline <s> akzitavya^

and the corresp. iast text metaline <k2> akṣitavyâ headline <s> akṣitavyâ

Here is the scan of the portion [now I have a very good scan of MW]

Do I make sense now, @funderburkjim ? [There are 108 such places in metalines.]

funderburkjim commented 2 years ago

akzitavya^

OK, now I see your concern.
Using mw.txt (the slp1 version), my count is slightly different: 114 matches for "<k2>.*?\^"

In slp1, the spelling uses the ^ character as an accent. Which accent?
It is svarita: See frontmatter

Next, we have the question of how to represent, in displays, this svarita accent with diacritics. In the printed text, the svarita accent is represented by a backward 'grave' accent, and 'udAtta' by a forward 'acute accent. There is no anudAtta mentioned or used.

The Cologne displays use a representation where svarita is represented by circumflex diacritic, udAtta by acute accent diacritic, and anudAtta by grave accent diacritic.

The iast version of MW which you are referring to also used the same 'Cologne' representation.

Thus, there is nothing that requires changing. Just remember that in IAST displays of MW, a Sanskrit word with circumflex-diacritic will appear in the MW printed text with a 'grave'-like diacritic.

funderburkjim commented 2 years ago

`<srs/>` and svarita

In the printed text of MW a Sanskrit word often appears with a 'circumflex' diacritic. But this is NOT an accent. It is a special convention (described on the same front matter page mentioned above) for representing vowel-sandhi. See for instance, aMSAMSa

The representation in the Cologne digitization uses the empty xml tag <srs/> : <s>aMSA<srs/>MSa</s>

Although MW describes 4 types of circumflex (representing short+short, short + long, etc.), the Cologne digitization does not distinguish among these types.

I have encountered a few cases where a vowel was coded with <srs/> but should have been coded with svarita. It seems likely that there are other such errors in the digitization.

funderburkjim commented 2 years ago

good revision/re-look/vetting of all the ls-expansions

Definitely agree. You are the best person to do this. You could edit the file tooltip.txt and give me the resulting edited file for installation. Agree best to make another issue devoted to discussions arising during the review.

Andhrabharati commented 2 years ago

Thus, there is nothing that requires changing.

I would say otherwise-- There definitely is a need to do something here!

One year ago, (April 2021) while I was at MW work (for Cologne), these 100+ places were all properly converted/rendered as à in the metalines and the resp. headlines, in the IAST file you gave, as also the whole lot (127k) of other à throughout the text, as per the print matter. [The file is dated 4th April 2021] https://github.com/sanskrit-lexicon/MWS/issues/104#issuecomment-817359904

To make you see the difference more clearly, I am giving two examples now (comparing MW and PWG/pwk)--

display of the entry asurya in MW

vs. PWG & pwk

Display of akṣitavya in MW

vs. pwkvn

And having à at these places makes the MW data tally with the original (sources) PWG/pwk data.

We don't have to reiterate that much of MW content is based on PWG family data, and they should be rendered in a similar fashion. [There should not be any second thought on this.]

Andhrabharati commented 2 years ago

As a glaring example, the Gaṇaratnāv. is not Gaṇaratnamahodadhi, but is Gaṇaratnāvalī!!

It is the "collection of Gaṇas to Pāṇini's gr. based on Gaṇaratnamahodadhi & other gr. & lex. works; composed in 1874 A. D. by Yajñeśvara Bhaṭṭa".

Just for info-- This work got printed after 100 years, in 1986.

Here is the title page and the list of works referred/cited therein--

[Probably @gasyoun might be interested to make a note of this info.]

Andhrabharati commented 2 years ago

@funderburkjim

Let's close this misc. corrections issue, with three more small corrections-

<div n="to"/><ab>[A-Z] (1100+ instances) as <div n="vp"/><ab>[A-Z] (2000+ instances presently), as all these denote vp type entities.

¦ , (19 instances) as , ¦ (5700+ instances presently)

... [three dots] (13 instances) as … [horiz. ellipsis] (no instance as of now)

[We can handle the remaining misc. corrections in another issue sometime later.]

funderburkjim commented 2 years ago

pwkvn iast accent

Note IAST output for akzitavya^ in pwkvn:

And in mw:

Note the accent representation is the same in IAST.

funderburkjim commented 2 years ago

pwkvn Devanagari accent

For pwkvn:

For MW:

Note the Devanagari representation for svarita accent DIFFERS in MW and in pwkvn.

why the difference?

We have CHOSEN to make the Devanagari accent representation in PW, PWK, PWG consistent with the printed form of PWG, etc. Thus, the little vertical line over the vowel is used to represent svarita accent in PW, etc. (Similarly, udAtta is the little superscript devanagari 'u' in PW, etc.)

If you are wanting to compare MW with PW in terms of accents, then you should use either the slp1 representation or the IAST representation.

I still say no change is warranted at this time. If (as I suspect) you still disagree, I suggest you open a new issue devoted to this subject.

Andhrabharati commented 2 years ago

I doubt if you would be correcting these under a new issue, when not convinced about the point here itself; so I do not want to go that way.

If you don't like to bring these 100+ cases (â) within MW in line with the rest of 127k+ cases (à), which are all à in the print, you're the final judge as far as cologne data is concerned.

Thus, I leave the matter for now.

funderburkjim commented 2 years ago

the rest of 127k+ cases (à), which are all à in the print,

126895 matches in 118274 lines for "a/" These are the instances (according to Cologne digitization) of the short vowel 'a' with udAtta accent. These are represented in print with an acute accent (e.g., under headword 'a'):

And, with output=iast in a Cologne display, they appear as a-with-acute-accent, á, not à

The a with grave accent (à) would be the Cologne iast representation of "a with anudAtta accent" (slp1 a\), - there are none of these in MW.

funderburkjim commented 2 years ago

Although I don't feel comfortable with changing the representation of svarita accent in Cologne mw displays, I'm not sure my view should prevail. I've opened another issue so the question of accent representation (especially in mw and in the PW family of dictionaries) will remain 'open' for some future consideration.

funderburkjim commented 2 years ago

3rd batch of changes

These mainly from above https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1236366820.

Also corrected several 'madA' entries to 'mada' -- some 'sub-entries' of 'mada' were incorrectly interpreted as feminine.

The details are in change_9.txt file of issue137 directory, and also mentioned in the readme at 'change_9' and following.

Here is the latest iast version of mw: temp_mw_issue137_iast_rev3.zip

Many varied improvements now made to the mw digitization and markup. Thanks to @Andhrabharati for his continued 'fresh look' at mw. Now closing the issue.

Andhrabharati commented 2 years ago

the rest of 127k+ cases (à), which are all à in the print,

It is my grosss mistake, using a wrong character at this.

funderburkjim commented 2 years ago

accent revision

temp1_mw_10_iast.zip

This is iast version of mw digitization, with accent revised (so svarita accent = grave accent diacritic). For discussion, refer #140.

Andhrabharati commented 2 years ago

Fantastic; now the CDSL MW Roman text matches with the print.

Thanks a lot for relieving my worry, @funderburkjim !

As MW does not mark the Devanagari accents in the book, I am not that bothered about them in MW display and would leave the matter to the discretion of Jim, whether to match MW with PWG family or not.

However my final comment on the matter is to ask the team (@funderburkjim & @gasyoun) to just check the MW RV citations once with the corresponding linked RV text (courtesy: Marcis) and see if they notice any differences in Devanagari accents, and then compare the PWG RV citations with the RV links thereupon. [Probably, my point would be appreciated then.]

sanskrit-lexicon / MWS

Miscellaneous corrections #137

Issues crept into temp_mw_3_iast, not in the prev. temp_mw_6_iast

correction/change work done.

punctuation at end of quoted text.

iast version

next

error correction

RLM

change_6, 7, 8

extended ascii

paren-bracket

<srs/> and svarita

pwkvn iast accent

pwkvn Devanagari accent

why the difference?

3rd batch of changes

accent revision

`<srs/>` and svarita