sanskrit-lexicon / csl-devanagari

Convert SLP1 data from csl-orig into Devanagari for easy proofreading.
0 stars 1 forks source link

CAE issues #3

Open drdhaval2785 opened 3 years ago

drdhaval2785 commented 3 years ago

अ/क instead of a/ka Need to handle accents better.

Andhrabharati commented 3 years ago

why not adopt the conversion "mapping" used by @funderburkjim in toto (including accents) ?

drdhaval2785 commented 3 years ago

http://www.sanskrit-lexicon.uni-koeln.de/talkMay2008/SLP1.pdf This shows that /, ^, \ mark udātta, svarita and anudātta respectively.

drdhaval2785 commented 3 years ago

I am trying indic-transliteration package https://pypi.org/project/indic-transliteration/ to convert to and from SLP1 / IAST / Devanagari. If you can help me out regarding the accent unicode points to be used for udātta, svarita and anudātta respectively, I would be able to update the package to help all concerned who use this package.

I vaguely remember that there was some such discussion in PW / PWG, but I do not remember the exact location. @Andhrabharati may help me to locate it.

Andhrabharati commented 3 years ago

@drdhaval2785

guess this is what you were looking for- https://github.com/sanskrit-lexicon/PWG/issues/5

and in the same issue, @funderburkjim has given the file he has used for slp1-deva conversion- https://github.com/sanskrit-lexicon/PWG/issues/5#issuecomment-894523564

funderburkjim commented 2 years ago

Representation of Devanagari accents is 'stable'. the slp1_deva.xml transcoding file (as at csl-websanlexicon/v02/makotemplates/web/utilities/transcoder/) will show the unicode code points.

The corresponding IAST transcoder file is slp1_roman.xml. The IAST representation of accents used in slp1_roman.xml may be 'unstable', in the following sense.

For some vowels and accents a 'preformed letter + accent is used' ref u00e1

<e> <s>SKT</s> <in>a/</in> <out>\u00e1</out> </e>

While for those vowels without preformed accents, a 'combining' accent (acute, grave, circumflex) is used, e.g. ref u0301

<e> <s>SKT</s> <in>f/</in> <out>\u1e5b\u0301</out> </e>

Using the preformed accents (like u00e1) may be a choice which should be changed in preference to using combining accents. Reason: then the devanagari unicode and IAST accent formations would be conceptually similar (append a particular combining unicode code point).

I have found no guidance regarding 'standard practice' for IAST accents.

Changing to combining accents in Cologne IAST displays would be straightforward: change the slp1_roman.xml file (in csl-websanlexicon and in csl-apidev).

However, I think there are 'hard-coded' preformed accents in some IAST within individual xxx.txt digitizations. Should they be changed also? Also, it is hard to type combining accents, and editor support of their display I am uncertain of. For such reasons, I have demurred from using combining accents in IAST unless forced to because of non-existence of particular preformed accented letters.