Some stray errors in mw data

Andhrabharati commented 1 week ago

Searched for "/" within <s>…</s> strings in the iast file, and got

(1603): <s>/kṣa—dhur</s> <k2>akṣa—dhur ;; "/" for "a" (53835): <s>ayas—m/yā<srs/>di</s> <k2>ayas—mayādi ;; "/" for "a" (126144): <s>up/riṣṭād—vātá</s> <k2>upariṣṭād—vātá ;; "/" for "a" (128751): <s>upa-sth/na—sāhasrī</s> <k2>upa-sthāna—sāhasrī ;; "/" for "ā" (357380): <s>nāvy/</s> <k2>nāvyā ;; "/" for "ā" (467080): <s>pra-°hār/ya</s> <k2>pra-hā́rya ;; "ār/" for "ā́r" (553166): <s>mukha—/taḥ-kāram</s> <k2>mukha—/taḥ-kāram ;; no "/" in print at all; k2 should also change (696441): <s>vyathā—r/hita</s> <k2>vyathā—rahita ;; "/" for "a" (729608): <s>śukr/—danta</s> <k2>śukra—danta ;; "/" for "a" (731560): <s>śunā-s/°ryâ</s> <k2>śunā-sīryâ ;; "/" for "ī" (762092): <s>s/khi—vigraha</s> <k2>sakhi—vigraha ;; "/" for "a"

Recall that I had recently mentioned of possible existence of differences between k2-field (in the metaline) & the HW (in the header portion), while talking about differences in k1 & k2 fields. ———————————————————————— And, this also showed [at 731560] an error in the mw transcoder file given by Jim earlier (for my usage), rendering "◌̂" (u+0302, combining circumflex) instead of "◌̀" (u+0300, combining grave) for the grave accent. There are 748 such places in total text data.

Interestingly, the web display of these are properly showing grave accent in Roman, though showing a wrong character in Devanagari!

funderburkjim commented 1 week ago

stray errors

corrections done. See change_1.txt.

Andhrabharati commented 1 week ago

This issue can be closed now.

funderburkjim commented 1 week ago

the question re devanagari, iast

For reference, I made a display showing the unicode characters for the iast and devanagari of slp1 SunA-sI°rya^. See unicode_testout.txt.

The devanagari display, using siddhanta font:

Here is the iast:

All the above looks ok to me. @Andhrabharati are you seeing a problem?

Or is it a problem in the iast-deva conversion that I prepared for you sometime?

Andhrabharati commented 1 week ago

About Devanagari accent

Here is what PWG (L-100369) has for the same word--

See what slp1 (as in LIES) mandates--

I was "dragged" into this again [recall I had earlier made a brief mention of this and left it for Jim's discretion; I did not pursue the matter further those days, as MW chose not to have accents in devanagari at all], after looking at Peter Schraf's mw_printchange 98 (dated Apr 14, 2014),

The svarita is represented by a vertical stroke

while I was marking the print-changes "within" the mw.txt file itself (similar to what I had done earlier in case of "sup" and "rev" markings).

Now, about Roman accent

It appears that two versions of transcoders are "floating around" in the CDSL "domain", namely one with IAST accents (having combined-circumflex mark for svarita), and another with ISO15919 accents (having combined-grave mark for svarita). This would definitely be a matter of great confusion among the users.

Jim MUST attempt to take a corrective action for this, and adopt what he has concluded those days.

With the grave accent (in the roman transliteration) and the vertical stroke (in devanagari), every bit of MW and PWG/pwk "Sanskrit strings" would be tallying with each other! [A perfect sync to celebrate!!!]

Andhrabharati commented 1 week ago

Here is what the "problematic" IAST style transcoder slp1_roman.xml has--

<e> <s>SKT</s> <in>a^</in> <out>\u00e2</out> </e>
<e> <s>SKT</s> <in>i^</in> <out>\u00ee</out> </e>
<e> <s>SKT</s> <in>u^</in> <out>\u00fb</out> </e>
<e> <s>SKT</s> <in>f^</in> <out>\u1e5b\u0302</out> </e>
<e> <s>SKT</s> <in>x^</in> <out>\u1e37\u0302</out> </e>
<e> <s>SKT</s> <in>A^</in> <out>\u0101\u0302</out> </e>
<e> <s>SKT</s> <in>I^</in> <out>\u012b\u0302</out> </e>
<e> <s>SKT</s> <in>U^</in> <out>\u016b\u0302</out> </e>
<e> <s>SKT</s> <in>F^</in> <out>\u1e5d\u0302</out> </e>
<e> <s>SKT</s> <in>X^</in> <out>\u1e39\u0302</out> </e>
<e> <s>SKT</s> <in>e^</in> <out>\u00ea</out> </e>
<e> <s>SKT</s> <in>o^</in> <out>\u00f4</out> </e>
<e> <s>SKT</s> <in>E^</in> <out>aî</out> </e>
<e> <s>SKT</s> <in>O^</in> <out>aû</out> </e>
<e> <s>SKT</s> <in>M^</in> <out>\u1e41\u0302</out> </e>
<e> <s>SKT</s> <in>H^</in> <out>\u1e25\u0302</out> </e>

Andhrabharati commented 1 week ago

All the accent related lines could be replaced by three lines, applying to any and every vowel (or ubhaya, i.e. M and H) letter.


<e> <s>SKT</s> <in>/</in> <out>\u0301</out> </e>
<e> <s>SKT</s> <in>^</in> <out>\u0300</out> </e>
<e> <s>SKT</s> <in>\</in> <out>\u0331</out> </e>

Andhrabharati commented 1 week ago

I would also request Jim to consider changing transcoding the "L" of slp1 to "ḻ" \u1e3b, from "ł" \u0142.

There is no authoritative iast substitute for slp1 L. … … …

We choose \u0142 Latin small letter L with stroke … … …

<e> <s>SKT</s> <in>L</in> <out>\u0142</out> </e>

It is to bring to his notice that this "ḻ" has been present ("hidden" ?!) since long in the mw data (at 5 places inside s1 strings), while he has chosen the new character "ł" somewhat recently. AFAIK, the "ł" character has never been used in any Indic (Sanskrit) text printed so far; it is used only in the European languages.

And it is not out of place to mention that the IAST does specify the character I had shown--

from the snippet I had posted earlier

and from wiki

sanskrit-lexicon / MWS