sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

mw72.txt deviations from AS #313

Closed drdhaval2785 closed 5 years ago

drdhaval2785 commented 8 years ago

Uses 'sh' for 'S' r2i for 'f' c4 for 'c' n6 for 'M' n3 for 'Y' n4 for 'N' lr2i for 'x' l2 for 'L'

funderburkjim commented 8 years ago

The file which governs the conversion from AS to slp1 for the headwords in MW72 is as_slp1.xml.

At the bottom is a list of non-standard AS forms. Pretty much what you listed above.

<!-- mw72 modifications -->
<e><s>INIT</s><in>n3</in><out>Y</out></e>
<e><s>INIT</s><in>n4</in><out>N</out></e> <!-- in orig, n.¤ -->
<e><s>INIT</s><in>c4</in><out>c</out></e>
<e><s>INIT</s><in>c4h</in><out>C</out></e>
<e><s>INIT</s><in>m2</in><out>M</out></e>  <!-- not special to mw72 -->
<e><s>INIT</s><in>n6</in><out>M</out></e>

<e><s>INIT</s><in>r2i</in><out>f</out></e>
<e><s>INIT</s><in>r2i1</in><out>F</out></e>
<e><s>INIT</s><in>lr2i</in><out>x</out></e>
<e><s>INIT</s><in>lr2i1</in><out>X</out></e>
<e><s>INIT</s><in>sh</in><out>z</out></e>

<!-- Dec 16, 2015. -->
<e><s>INIT</s><in>l2</in><out>L</out></e>
<e><s>INIT</s><in>l2h</in><out>|</out></e>

You might find that similar files for other dictionaries are relevant for your current work.

One thing which we are now probably in a position to do is to do these odd conversions once and for all in the reference digitization (e.g. in mw72.txt), so we don't have to keep remembering these transient details.

drdhaval2785 commented 8 years ago

Jim, I will hug you tight, if you can kill these differences once and for all. All Sanskrit in SLP1 please.

gasyoun commented 8 years ago

One thing which we are now probably in a position to do is to do these odd conversions once and for all in the reference digitization

Yes. Some details might get lost in UTF8 (as I remember after exploring this field in 2014), but the hell with them if they add so many issues.

drdhaval2785 commented 5 years ago

https://github.com/sanskrit-lexicon/MW72/issues/3 killed the differences.