BUR Devanagari-IAST comparison

funderburkjim commented 8 years ago

The Burnouf dictionary also gives both a Devanagari and IAST form for the headwords. So it should be possible to do a comparison of these, as was done with BEN in #287, as a means to check both spellings.

funderburkjim commented 8 years ago

BUR uses 'w' in his IAST for the more usual 'v'; e.g.

Probably we should consider this a 'feature' of BUR, rather than a bug; and take this quirk into account when doing the Devanagari-IAST comparison. (i.e., not consider IAST 'hwal' a spelling error).

[When a comparison program is written, other peculiarities of IAST may emerge; and if so, should be similarly considered non-errors.]

Do others concur?

gasyoun commented 8 years ago

not consider IAST 'hwal' a spelling error

Fully agree.

funderburkjim commented 8 years ago

The preparatory work has been done, and is in this issue-296prep directory.

645 cases are identified (see hwchk_iast1.org, use raw text or Emacs to view).

The IAST appears in our digitization bur.txt in a variant of the AS (Anglicized Sanskrit) coding (letters and numbers, usually). Here are the main differences from usual AS coding conventions (The 'out' values are in SLP1 transliteration of Devanagari.)

<e><s>INIT</s><in>r2</in><out>f</out></e>
<e><s>INIT</s><in>ç</in><out>S</out></e>
<e><s>INIT</s><in>s2</in><out>z</out></e>
<e><s>INIT</s><in>x</in><out>kz</out></e>
<e><s>INIT</s><in>ao</in><out>O</out></e>
<e><s>INIT</s><in>ae</in><out>E</out></e>
<e><s>INIT</s><in>n4</in><out>M</out></e>
<e><s>INIT</s><in>ch</in><out>C</out></e>
<e><s>INIT</s><in>w</in><out>v</out></e>

Two other conversions, noticed during case examination:

<e><s>INIT</s><in>l2i</in><out>x</out></e>
<e><s>INIT</s><in>l2</in><out>L</out></e>

As time goes by, I'll examine and modify hwchk_iast1_edit.org for needed corrections, then harvest the results as standard form corrections. If anyone wants to do some of these, let me know so we can divide the work.

funderburkjim commented 8 years ago

All these cases have now been examined, and the corrections installed.

Here are some stats:

IAST-p 135   corrections to IAST which were judged to be due to an error in the printed text
DEVA-p 97   corrections to Devanagari
IAST-n 60   These were false positives of one kind or another
IAST-t 224    corrections to IAST which were judged to be due to typist error.
DEVA-t 129   typist errors in Devanagari

 TOTAL 645

The above stats are slightly wrong, due to the fact that about 8-10 of the cases had been previously corrected, probably in the course of the N-gram corrections.
The no-change cases were generally of two kinds
- the 'l2' (l with under-dot) in IAST was misinterpreted. The choice is between the vocalic 'l' of 'kxp' (slp1 coding) or the gutteral 'L' (devanagari looks like infinity sign). The SLP1-IAST conversion didn't work here, which may be due to either (a) ambiguity in the use of l-dot in text or (b) mistake in the transcoding.
- In words whose Devanagari spelling uses visarga (like duHKa or niHsaMkalpa), BUR consistently uses an 's' in IAST to represent the visarga. This appears to me to be an ambiguity in the IAST which the IAST-SLP1 transcoding cannot (nor should not be able to) resolve. Yet, it was not considered to be an error that should be changed.
The no change cases are at the bottom of the corrections_nochange file
The print errors are at the bottom of the bur_printchange file.
The hwchk_iast1_edit is the best source for all the items.

gasyoun commented 8 years ago

consistently uses an 's' in IAST to represent the visarga. This appears to me to be an ambiguity in the IAST which the IAST-SLP1 transcoding cannot (nor should not be able to) resolve. Yet, it was not considered to be an error that should be changed.

Fully agree with your choice. An oddity, good that it's documented now.

sanskrit-lexicon / CORRECTIONS

BUR Devanagari-IAST comparison #296