Sanskrit coding conventions

funderburkjim commented 6 years ago

The purpose of this issue is to document the conventions used in various dictionaries to represent Sanskrit words.

Exactly how to construct such a documentation is unclear at the moment. So the approach will be free form, aiming to get the relevant facts mentioned. At some later time the facts gathered here may be reformed into a more useful reference.

funderburkjim commented 6 years ago

STC

Sanskrit words are coded exclusively using a version of IAST (Latin alphabet with diacritics).

The version of IAST is the same as modern IAST, with one exception: ç is used instead of ś.

The current Cologne digitization removes this exception.

Accents are not used in Sanskrit words.

There are a small number of cases where a 'short-long' vowel is indicated in the text:

The digitization currently represents these as ī¤ (i.e. the special character ¤ is appended to the long vowel.)

Usually, Sanskrit text is identified as being either in bold text (as the citation form of an entry) or in italic text.
Regular text is used for French words. However there are numerous instances of Sanskrit words appearing in regular text. e.g.

As the example suggests, these spellings also follow standard IAST (with the one exception mentioned above).

funderburkjim commented 6 years ago

With STC as one example, does anyone have comments regarding the material mentioned in the summary?

gasyoun commented 6 years ago

The digitization currently represents these as ī¤

We can have it as a combined Unicode character as well, no need to leave it as it was.

cam

comments regarding the material mentioned in the summary?

No, but I would describe it in a tabular form, I guess or make a checklist, otherwise, too many variations will be hard to observe.

funderburkjim commented 6 years ago

Cambria and Charis SIL

Are those fonts? If so, they certainly represent the short/long in a nice way.

... tabular form ...

Let me do a couple more 'free form', then maybe a tabular pattern will suggest itself.

funderburkjim commented 6 years ago

MW

The conventions for the printed IAST of MW are mostly described in The dictionary order of the nagari letters.

While most of the printed conventions agree with modern IAST, there are a few differences. The transcoder file (iast_iast1.xml) summarizes the differences from the printed table:

<e> <s>INIT</s> <in>ṡ</in> <out>ś</out> </e> 
<e> <s>INIT</s> <in>Ṡ</in> <out>Ś</out> </e> 
<e> <s>INIT</s> <in>sh</in> <out>ṣ</out> </e> 
<e> <s>INIT</s> <in>Sh</in> <out>Ṣ</out> </e> 
<e> <s>INIT</s> <in>ḷ</in> <out>ḻ</out> </e> <!-- only Iḷā -> -->
<e> <s>INIT</s> <in>ṛi</in> <out>ṛ</out> </e> 
<e> <s>INIT</s> <in>Ṛi</in> <out>Ṛ</out> </e> 

<!-- There are no instances of these conversions
  in Cologne digitization of MW99 -->
<e> <s>INIT</s> <in>ṛī</in> <out>ṝ</out> </e> 
<e> <s>INIT</s> <in>lṛi</in> <out>ḷ</out> </e> 
<e> <s>INIT</s> <in>lṛī</in> <out>ḹ</out> </e>

Remaining differences from modern IAST

ṉ for ṃ From the table, MW considers ṉ to be 'true anusvAra'. Only instances are before h or a sibilant sśṣ. But the rule for usage of ṉ is more complicated, as there are many instances of ṃ before hsśṣ.
- Just to emphasize, Malten's original AS coding distinguished ṉ and ṃ, and the conversion to unicode has retained this distinction. Thus, the remaining ṉ instances are non-standard IAST.
- However, the coding of headwords in Malten's original does not retain the ṉ-ṃ distinction:
- Example: <H1>100{haMsa}1{haMsa4} -- haMsa4 would have been coded as han6sa4 if the distinction had been made.,
circumflexed vowels remain: â Â î û ê ô are retained; their modern IAST equivalents are ā Ā ī ū e o
- The circumflex indicates that the long vowel occurs as the result of vowel sandhi, normally in compounds such as Gaṇêśa.

`<srs/>` tag

In Sanskrit text coded as slp1 within the <s> tag, the circumflexion of vowels is indicated by an <srs/> tag following the vowel. For instance, <s>gaRe<srs/>Sa</s>.

In current displays, this <srs/> tag is ignored, even when the user has requested IAST output for Sanskrit text.

It is likely possible to alter the transcoding rules for MW to display <s>gaRe<srs/>Sa</s> as gaṇêśa (with the circumflex) when IAST output is generated.

`<shortlong/>` tag

In Sanskrit text coded as slp1 within the <s> tag, the <shortlong/> tag after a vowel indicates that the vowel may be either short or long. There are 208 such instances. As with <srs/>, this tag plays no role in displays but could be converted in IAST display output to display the vowel+macron+combining-breve.

In Sanskrit words coded as IAST, this vowel+macron+combining-breve representation is used; there are very few instances of this.

Accents

In Sanskrit text coded as slp1 within the <s> tag, an accented vowel is indicated by a character following the vowel (/ for udatta accent,\ for anudatta, and ^ for svarita).

The transcoding scheme represents these in appropriate ways when (a) Devanagari or IAST output is requested, and (b) display controls have requested that accents be shown. Otherwise, the accents are ignored in output.

gasyoun commented 6 years ago

If so, they certainly represent the short/long in a nice way.

Yes, and a very good one https://software.sil.org/charis/

'free form', then maybe a tabular pattern will suggest itself.

Sure.

coding of headwords in Malten's original does not retain the ṉ-ṃ distinction

So the distinction is inside articles, but not in headwords?

could be converted in IAST display output to display the vowel+macron+combining-breve.

Possible in Charis.

funderburkjim commented 6 years ago

the distinction is inside articles, but not in headwords?

Yes - that's a pretty accurate summary.

funderburkjim commented 6 years ago

We can have it as a combined Unicode character as well, no need to leave it as it was.

This has now been done. For example, hw=hrdIka

funderburkjim commented 6 years ago

with one exception: ç is used instead of ś. The current Cologne digitization removes this exception.

@sanskritisampada and I discovered that there were still many ç in Sanskrit words. We think that at least most of these have today been replaced with ś.

gasyoun commented 6 years ago

This has now been done.

Looks good.

funderburkjim commented 6 years ago

BURNOUF IAST

Burnouf represents Sanskrit headwords in Devanagari, italicized Sanskrit in his own brand of IAST, and non-italic Sanskrit (proper names) in this same brand of IAST. This snip illustrates the three categories.

In the current status of the Burnouf digitization, the Devanagari has been converted to SLP1 transliteration ({#SLP1#} in bur.txt, and <s>SLP1</s> in bur.xml). The italic Sanskrit has been converted to modern IAST. However, the non-italic Sanskrit proper names have not been converted to modern IAST; with @sanskritisampada 's help to identify the non-italic Sanskrit words, these will also soon be converted to modern IAST.

Burnouf's IAST conventions

The print conventions are described in the Tableau de transcription; there are a few variances from this description in the actual text.

short vowels agree with modern IAST: a,i,u,ṛ. But ḷi is used instead of ḷ.
The long form of the vowels uses circumflex, rather than macron: â, î, û, ṛ̂, ḷî
The 'short' diphthongs also use a circumflex: ê , ô.
The 'long' diphthongs are printed as æ ꜵ (modern ai, ao)
In the gutturals, cerebrals, dentals, and labials,
- the unaspirated form agrees with modern IAST: k, g, ṭ, ḍ, t, d, p, b.
- aspiration is indicated by an acute accent, rather than by modern 'h': e.g. k', g', etc.
- guttural nasal in print may be ẏ instead of ṅ; other 3 are same as modern: ṇ , n , m
Palatals are all different: c', j' unaspirated, c̃, j̃ and nasal the modern ñ.
Semivowls are modern y, r, l, v
- 'v' is according to table:
- but print also uses 'w' for some words:
Semivowels: ç , ś, instead of modern ś, ṣ and s.
- In non-italic proper names, at least sometimes 'sh' is used for modern 'ṣ' such as Vishnu.
- Note in Vishnu, the n is also a plain 'n', instead of ṇ
Aspirate 'h'.

Also, 'x' for modern conjunct kṣ:

That's Burnouf's system in summary, as it appears to me as of this writing.

gasyoun commented 6 years ago

few variances from this description in the actual text.

Well done, as usual.

Burnouf's system in summary

It's a mix somehow similar to what we see nowadays in India - all systems involved at once, in a single word even sometimes.

funderburkjim commented 6 years ago

MW72 IAST conventions

Monier-Williams provides a table of nāgarī letters with indo-romanic equivalents. He also writes an opinionated multi-page treatise Alphabet and System of Transliteration .

previous discussion

In conjunction with previous work on converting MW72 conventions to modern IAST in the Cologne digitizations, several issue comments have dealt with this dictionary's conventions:

Sanskrit words

The printed text uses Devanagari for 'major' headwords. These are transcoded in the Cologne digitization to SLP1: {#dfzad#}. After the Devanagari, the text shows in italics the indo-romanic spelling.
'Minor' headwords appear only in italic font with indo-romanic spelling, and are capitalized.
- The cologne digitization includes these minor headwords as headwords.
'Compound' headwords are also appear only in italic font with indo-romanic spelling, and are capitalized.
- The cologne digitization currently does not recognize these minor headwords as headwords.
Sanskrit words in italic text appear in indo-romanic spelling, such as 'cf. *upalā*'
Many Sanskrit proper names appear within entries but in non-italic font, and these also have the indo-romanic spelling; for instance Upalā.

In the current MW72 digitization , all the text identified as being in indo-romanic spelling has been converted to modern IAST.

Points of agreement between mw72's indo-romanic letters and modern IAST

Simple Vowels: a, ā, i, ī, u, ū,
All diphthongs: e, ai, o, au
Visarga: ḥ
Gutturals: k, kh, g, gh
Palatals: j, jh
All cerebrals: ṭ, ṭh, ḍ, ḍh, ṇ
All dentals: t, th, d, dh, n
All labials: p, ph, b, bh, m
All semivowels: y, r, l,
- Note: ḷ and ḷh are also used for the vedic variants; these are not mentioned in our source for IAST
Sibilants : ś , s (palatal, dental) and 'h'

Points of difference between mw72's indo-romanic letters and modern IAST

Vowels: ṛi, ṛī, lṛi, lṛī used in text vs. modern IAST ṛ, ṝ, ḷ, ḹ
Anusvara: ṉ and ṃ used in text; modern form ṃ is used in digitization (except in the digitization of the preface.
Gutturals : n-with-middle-dot (no unicode equivalent) used for modern ṅ :
Palatals: ć, ćh and ṅ in text, vs. modern c, ch, and ñ.
Sibilant: sh for cerebral sibilant, vs. modern ṣ.

misc comments:

º ((\u00ba) MASCULINE ORDINAL INDICATOR) that the rest of a word is to be supplied.
- Perhaps should use ° (\u00b0) DEGREE SIGN, as in MW(1899). There is inconsistency across dictionaries in this detail of the digitization.
MW72 does not use accents in Sanskrit words, although MW99 does
MW72 does not use the circumflex to indicate long vowels occurring via sandhi, as MW99 does.

gasyoun commented 6 years ago

does not recognize these minor headwords as headwords

But not so in MW, right?

Perhaps should use °

Agree.

funderburkjim commented 6 years ago

GRA

Grassman dictionary uses a version of IAST, with accents, to represent Sanskrit words (as well as cognate words in other languages). The details regarding his brand of IAST and its conversion to modern IAST in the Cologne digitizations is well-described in #199, specifically in this link and this link; no further comments needed here.

funderburkjim commented 6 years ago

PWG

Sanskrit words are generally shown in Devanagari (coded as SLP1 in the digitization); accents are used.

However, as discussed in #195, the printed text uses Latin letters with diacritics in

literary source abbreviations within the body of the dictionary
in the list of abbreviations appearing in the forward
in 'widely spaced' text within the body (these are marked as <is>X</is> in digitization).

A summary comparing PWG's IAST and modern IAST:

short vowels agree with modern IAST: a,i,u,ṛ.
The ḷ vowel unused, either short or long
The long form of the vowels uses circumflex, rather than macron: â, î, û, (no long ṛ found)
Visarga uses modern ḥ ; anusvara uses m̃ insteady of modern ṃ.
The diphthongs are e , o, ai, ao (same as modern)
In the gutturals, cerebrals, dentals, and labials,
- the unaspirated form agrees with modern IAST: k, g, ṭ, ḍ, t, d, p, b.
- the aspirated forms add an h, as does modern IAST
- guttural nasal is ñ instead of ṅ; other 3 are same as modern: ṇ , n , m
Palatals are all different: ḱ, ḱh, ǵ, ǵh (no instances) and ń are used for modern c, ch, j, jh, ñ
These semivowls are modern r, l, v ; j is used for modern semivowel 'y'.
Sibilants: ç, sh, s vs. modern ś , ṣ, s
ḷ is used for the Vedic consonant. (3 times Iḷâ, one time Aiḷa)

There are also differences in the way the PWG text represents Devanagari accents, in comparison to the Unicode Vedic extensions. See this documentation of PWG accents.

funderburkjim commented 6 years ago

PW

The description above for PWG is applicable to the representation of Sanskrit words in PW. The only variance I noticed was that there are no instances in PW of ḷ for the Vedic consonant.

We've tackled the peculiarities of IAST in PW previously:

183
PWK#14

a source for corrections

There are about 4500 words coded as <is>X</is> in the digitization; X is supposed to be modern IAST spelling of a Sanskrit word. There are some errors that could be corrected (e.g. Maṅguśrī -> Mañjuśrī ). The list of error candidates could be reduced by eliminating the words which are Sanskrit headwords in, say, MW (e.g. Āṅgirasa). This would be a good task for someone to undertake.

gasyoun commented 6 years ago

4500 words coded as X

And no smart way to lessen the list, right?

funderburkjim commented 6 years ago

Wrong - there is a smart way to lessen the list. We can lessen the list by removing cases like Āṅgirasa, which is an MW headword. When someone commits to working on this task, I can lessen the list in this way. Also, a list of links to PW instances of each remaining word can be generated, to make the lookup process more efficient.

drdhaval2785 commented 6 years ago

Ok. After a long time, I guess I will start working on this. Let me know the modalities.

funderburkjim commented 6 years ago

@drdhaval2785 Good news!

Which part are you interested in at the moment?

drdhaval2785 commented 6 years ago

Corrections in PW as mentioned in this thread.
Common script creation (not only for display scripts, but also maintenence scripts).

funderburkjim commented 6 years ago

pw corrections offloaded to https://github.com/sanskrit-lexicon/CORRECTIONS/issues/419

funderburkjim commented 6 years ago

PD

Sanskrit headwords appear in both Devanagari and IAST forms. Within the body of the text, both forms also appear. Accents may be present in either Devanagari or IAST forms.

From #203, the only variance from what we consider standard IAST was that the anusvara of the printed text uses ṁ (m with dot above) whereas modern IAST uses ṃ (m with dot below). The digitization uses the modern form, ṃ.

There are a few unusual features appearing in the Devanagari spellings of the text, according to the comments of #203.

funderburkjim commented 6 years ago

BHS

The Edgerton Buddhist Hybrid Sanskrit Dictionary represents Sanskrit words (as well as words in Pali, and other languages) with Latin alphabet with diacritics; there is no Devanagari. The diacritics for Sanskrit words agrees with what we are taking as modern IAST (refer #201).

funderburkjim commented 6 years ago

WIL

Wilson uses both Devanagari and his own brand of IAST to represent Sanskrit words. WIL IAST conventions

Based on work done in conversion of Cologne digitization from AS to modern IAST; no known description of Wilson's IAST system by the author. There is also difficulty in interpreting the scanned images due to printing quality.

short vowels agree with modern IAST: a,i,u; but rĭ (or ri) instead of ṛ.
The ḷ vowel unused, rare -
The long form of the vowels uses circumflex and/or acute accents: â or á, î or í, û or ú, (no long ṛ found in IAST form)
Visarga uses modern ḥ ; anusvara uses m̃ insteady of modern ṃ.
The short diphthongs are é and ó instead of modern e , o; the long diphthongs are are ai, ao au (same as modern)
gutturals k, kh, g, gh same as modern; no IAST guttural nasal found
palatals ch, ch'h , j, jh and (for modern c, ch, j, jh); 'n' is used as palatal nasal when preceding a palatal: e.g. 'nj' in place of modern 'ñj'
cerebral consonants indicated by a trailing apostrophe (or sometimes an acute accent): t', t'h, d', d'h and n' in place of ṭ, ṭh, ḍ, ḍh, ṇ
dentals agree with modern: t, th, d, dh, n
labials are p, p'h, b, b'h and m compared to modern p, ph, b, bh, m
Semivowels are modern: r, l, v, y,
Sibilants are s', sh, and s in place of modern ś, ṣ and s.

The Cologne digitization is based on the 1832 edition. @SergeA I recall that you referenced a different edition (1819?) in some prior work, but could not find the pdf -- do you recall the link to this other edition?

SergeA commented 6 years ago

http://reader.digitale-sammlungen.de/en/fs1/object/display/bsb10932200_00005.html http://reader.digitale-sammlungen.de/en/fs1/object/display/bsb10495525_00005.html Wilson 1st ed. 1819 (click PDF-Download > Ja/Yes; 4-digit pin-code ; Weiter/Go > after few time the link for the file will appear)

SergeA commented 6 years ago

the long diphthongs are are ai, ao (same as modern)

Did you mean au ?

funderburkjim commented 6 years ago

Did you mean au ?

Yep! corrected.

Download working on WIL. Thanks!

funderburkjim commented 6 years ago

YAT

Yates mostly uses Devanagari to represent Sanskrit words; but there are many Sanskrit words (mostly proper names) appearing within entries in Yates' version of IAST. As with Wilson, there is no explanation from the author of his IAST system, so the following summary is based on observations made during the conversion of the diacritic spelling from the AS (letter-number) coding to modern IAST in unicode encoding.

The AS coding appears in only 628 words (quite a small number).

Long vowels are represented with an acute accent: á, í, ú instead of ā, ī, ū.
The short diphthongs are é and ó instead of modern e , o;
Vocalic r uses ṛ (only 1 instance)
Cerebral consonants represent in modern form : ṭ, ṭh, ḍ, ḍh , ṇ

Incompleteness of IAST conversion

The conversion work done thus far only addressed words originally coded as having a diacritic expressed in the AS form.

There are also Sanskrit words spelled with the Latin alphabet in Yates which do not happen to have diacritics. For example : Shiva, Vishnu, Krishna. A modest improvement to the Yates digitization would be the identification of such Sanskrit words and then their conversion to modern IAST.
The previous work incorrectly spelled 6 words with the cerebral sibilant as ṣh instead of ṣ. These also should be corrected.

funderburkjim commented 6 years ago

WIL 1819 IAST reference

Page 59 of the 1819 edition has a table describing Wilson's IAST system, which he describes as 'following the system of Sir Wm. Jones' .

The empirical summary of the above comment seems to agree in most respects, though I see a few differences. Assuming the 1832 edition upon which the Cologne digitization was based uses the same IAST system (a reasonable assumption), this table from 1819 edition could be of interest if someone wanted to more fully investigate the original printed form.

Incidentally, the entire 50+ pages of the preface of the 1819 edition looks interesting.

funderburkjim commented 6 years ago

SHS

SHS seems to be just a copy (mostly word for word) of WIL. So all the comments about Sanskrit word representation in Wilson are applicable to Shabda Sagara.

As with YAT, there are numerous IAST corrections which could be made to SHS -- the printing seems to be casual in applying rules for diacritics in such words.

Is SHS 'needed'?

I compared two entries (guru, and Gawa) as they appear in WIL and SHS. And they were almost identical, down to the last period or comma. This even though SHS shows a publication date of 1900, and WIL (2nd ed.) of 1832.

Are there some improvements that SHS brings to WIL ?

funderburkjim commented 6 years ago

GST

Goldstücker's dictionary is also an extension of Wilson's dictionary, although containing only 6000+ headwords through aByAhita. It generally follows Wilson's conventions for representing Sanskrit words, both in Devanagari and in Wilson's version of IAST.

Based on the empirical evidence of the conversion from GST's IAST to modern IAST, Goldstücker's IAST conventions are:

short vowels: a, i, u, ŕi
long vowels: á, í, ú, [no long ŕi found]
diphthongs : e, o, ai, au [same as modern - a difference from Wilson]
anusvara m', visarga h'
gutturals: k, kh, g, gh, n
palatals: ch, chh, j, jh, n
cerebrals: t' , t'h, d', d'h, n'
dentals: t, th, d, dh, n
labials: p, ph, b, bh, m
semivowels: y, r, l, v or w
sibilants: ś, sh, s and h

funderburkjim commented 6 years ago

SKD, VCP

These two dictionaries use only Devanagari to represent Sanskrit words.

funderburkjim commented 6 years ago

BEN

Benfey's dictionary displays headwords in both Devanagari and a version of IAST. Most of the Sanskrit words within entries are in his IAST, but in entries for roots, verbal prefixes appear in Devanagari.

No explanation of the IAST conventions used in Benfey have been found in the printed text. The following summary was developed empirically during the process of converting Benfey's IAST to agree with modern IAST.

short vowels agree with modern IAST: a,i,u; but ṛi instead of ṛ. But in abbreviations of references, the 'i' is dropped, as Bhartṛ. abbreviation for Bhartṛihari. ḷi is used for vocalic ḷ in kḷip; the long form doesn't appear.
ḷ also used for the Vedic consonant ळ.
The long form of the vowels use: â , î , û , ṛî NOTE: the conversion from AS to modern did not handle ṛî (r2i10) properly; this needs to be fixed.
Visarga uses modern ḥ ; anusvara uses m̄ (m-tilde) instead of modern ṃ.
Diphthongs same as modern IAST: e, o, ai, au.
gutturals k, kh, g, gh, ṅ same as modern
palatals ch, chh , j, jh, ń (for modern c, ch, j, jh, ñ);
cerebral consonants same as modern: ṭ, ṭh, ḍ, ḍh, ṇ
dentals agree with modern: t, th, d, dh, n
labials agree with modern: p, ph, b, bh, m
Semivowels are modern: y, r, l, v
Sibilants are ç, sh, and s in place of modern ś, ṣ and s.
h is h

funderburkjim commented 6 years ago

BOP

Generally Sanskrit words are presented in Devanagari within Bopp Glossarium Sanscritum; this includes headwords.

However, some words appearing in Latin alphabet have letters with diacritics. As mentioned in #202, no attempt has been made to 'modernize' the spelling of such words; the main reason is that they are likely Latinate forms of Sanskrit words, rather than Sanskrit words spelled with Latin alphabet.

gasyoun commented 6 years ago

Shiva, Vishnu, Krishna. A modest improvement to the Yates digitization would be the identification of such Sanskrit words and then their conversion to modern IAST.

Let it remain as such.

SHS seems to be just a copy (mostly word for word) of WIL.

Now that's an interesting fact.

Are there some improvements that SHS brings to WIL ?

Hard to say, nobody in India uses it. Wilson is checked as the very first one rarely, but SHS not even seldom.

Goldstücker's dictionary is also an extension of Wilson's dictionary

Right, such it was intended. Same as the book I've reprinted https://www.ozon.ru/context/detail/id/140949762/

handle ṛî (r2i10) properly; this needs to be fixed.

Easy to fix?

Latinate forms of Sanskrit words

So let them be. Bopp is interesting as MW takes his etymologies from Bopp.

funderburkjim commented 6 years ago

the book at 'www.ozon.ru'

Using Google translate, here is the first sentence of the description in English:

The Sanskrit dictionary compiled by G.G.Vilson at the beginning of the XIX century (1832) was based 
not on the processing of the texts themselves, but on the use of medieval autochthonous manuals.

This sounds interesting, but I don't understand the part translated as but on the use of medieval autochthonous manuals.

Can you elaborate?

funderburkjim commented 6 years ago

ACC

The printed text uses Devanagari for the headwords. A version of IAST is used within entries to write the Sanskrit names of works and authors. This IAST system is shown in the preface and repeated here:

a ā i ī u ū
ṛi ṛī e ai o au
k kh g gh n̄
c ch j jh ñ
ṭ ṭh ḍ ḍh ṇ
t th d dh n
p ph b bh m
y r l v
ç sh s h

It is quite close to the modern IAST conventions. The differences are:

ṛi -> ṛ
ṛī -> ṝ (no instances)
n̄ -> ṅ
ç -> ś
sh -> ṣ

funderburkjim commented 6 years ago

AE

Sanskrit words are almost always represented in Devanagari. Based on the few instances where Latin alphabet with diacritics represent Sanskrit words, the system differs from modern IAST as follows:

long vowels use circumflex rather than macron: â, î, û -> ā. ī , ū
ri -> ṛ
ch -> c, chh -> ch
italic letters used for cerebrals: n -> ṇ, t -> ṭ,
sh -> ṣ
s represents either ś or dental s

These instances have been only partially converted to modern IAST thus far. Since there are so few occurrences, finishing the conversion to modern IAST is not a particular concern.

funderburkjim commented 6 years ago

AP

Apte's Practical Sanskrit-English dictionary ('57) uses both Devanagari and a version of IAST to represent Sanskrit words. I find no table listing the IAST conventions, but empirical evidence suggests that the text's IAST is almost the same as what we are calling modern IAST. The only variances I notice are:

ṛi is usually used for ṛ, as in Kṛiṣṇa; the digitization has changed these to modern, as Kṛṣṇa.
- but sometimes, esp. in literary source abbreviations such as Ṛv, there is no 'i' in the text.
anusvara is represented as ṁ, rather than modern ṃ.

funderburkjim commented 6 years ago

AP90

Apte's Practical Sanskrit-English dictionary (1890) uses both Devanagari and a version of IAST to represent Sanskrit words. The version of IAST is quite peculiar, and in fact uses two different systems. The only relevant comment I've found in the front matter is

One can see both systems in this snip from the top of page 2 (the first letter of Adiparvan is italicized, which corresponds to the fact that the letter is vowel is long; two other instances of long-a are seen with the circumflex.)

Here is the IAST system as it exists in the text; this explanation is derived from empirical observation, aided by the careful (AS) coding of Thomas' original digitization. All of these variations have been converted (I think!) to modern IAST in the current version of the digitization.

short vowels agree with modern IAST: a,i,u;
The long form of the vowels use: â , î , û ; also á, à, í, ì, ú, ù and italic forms a, i, u.
ṛi for ṛ
Visarga uses h instead of ḥ
anusvara uses a plain m or n instead of ṃ.
Diphthongs same as modern IAST: e, o, ai, au.
gutturals k, kh, g, gh; same as modern. nasal plain 'n' instead of ṅ
palatals ch, chh , j, jh, n (for modern c, ch, j, jh, ñ);
cerebral consonants same as modern: ṭ, ṭh, ḍ, ḍh, ṇ; but also with italic forms: t, th, d, dh, n
dentals agree with modern: t, th, d, dh, n
labials agree with modern: p, ph, b, bh, m
Semivowels are modern: y, r, l, v
Sibilants are s (italic), sh, and s in place of modern ś, ṣ and s.
h is h

gasyoun commented 6 years ago

Can you elaborate?

Wilson made a dictionary that was not based on what he read or found, but what Amarakosha found.

uses two different systems

Worst case ever.

careful (AS) coding of Thomas' original digitization.

How much work and love he put in it!

funderburkjim commented 6 years ago

How much work and love Thomas put in it! 👍

BOR

Borooah English-Sanskrit Dictionary primarily uses Devanagari to represent Sanskrit words, but uses a version of IAST to represent Sanskrit words in about 500+ lines of the text.

The details of the IAST conventions are as follows, based upon empirical observations in the course of conversion to modern IAST spellings:

short vowels agreeing with modern IAST: a,i,u;
`ri' for vowel ṛ.
Long vowels use macrons: ā, ī, ū
Visarga is ḥ
anusvara - no instances
Diphthongs same as modern IAST: e, o, ai, au.
gutturals k, kh, g, gh same as modern. plain n for ṅ
palatals ch, chh , j, jh, n (for modern c, ch, j, jh, ñ);
cerebral consonants same as modern: ṭ, ṭh, ḍ, ḍh, ṇ (sometimes plain n)
dentals agree with modern: t, th, d, dh, n
labials agree with modern: p, ph, b, bh, m
Semivowels are modern: y, r, l, v
Sibilants are sh or ś, sh, and s in place of modern ś, ṣ and s.
h is h

funderburkjim commented 6 years ago

CAE

Cappeller Sanskrit-English Dictionary primarily uses Devanagari to represent Sanskrit words, but uses a version of IAST to represent Sanskrit words in about 1500+ lines of the text. Devanagari uses udatta and svarita accents.

The details of the IAST conventions are as follows, based upon empirical observations in the course of conversion to modern IAST spellings:

short vowels agreeing with modern IAST: a,i,u,ṛ (but sometimes ri for ṛ)
Long vowels use macrons: ā, ī, ū
Visarga is ḥ
anusvara same as modern ṃ; but sometimes 'ṅ'
Diphthongs same as modern IAST: e, o, ai, au.
gutturals k, kh, g, gh; same as modern. n¯ (n-macron) for guttural nasal.
palatals c, ch , j, jh, ñ -- same as modern
cerebral consonants same as modern: ṭ, ṭh, ḍ, ḍh, ṇ
dentals agree with modern: t, th, d, dh, n
labials agree with modern: p, ph, b, bh, m
Semivowels are modern: y, r, l, v
Sibilants are ç instead of modern ś , and modern ṣ and s.
h is h

funderburkjim commented 6 years ago

CCS

Cappeller Sanskrit Wörterbuch primarily uses Devanagari to represent Sanskrit words, but uses a version of IAST to represent Sanskrit words in about 1500+ lines of the text. Devanagari uses udatta and svarita accents.

The details of the IAST conventions are as follows, based upon empirical observations in the course of conversion to modern IAST spellings:

short vowels agreeing with modern IAST: a,i,u,ṛ (but sometimes ri for ṛ)
Long vowels use circumflex: â, î, û
Visarga is ḥ
anusvara same as modern ṃ; but sometimes 'ṅ'
Diphthongs same as modern IAST: e, o, ai, au.
gutturals k, kh, g, gh; same as modern. n¯ (n-macron) for guttural nasal.
palatals c, ch , j, jh, ñ -- same as modern
cerebral consonants same as modern: ṭ, ṭh, ḍ, ḍh, ṇ
dentals agree with modern: t, th, d, dh, n
labials agree with modern: p, ph, b, bh, m
Semivowels are modern: y, r, l, v
Sibilants are ç sh, s instead of modern ś ,ṣ and s.
h is h

funderburkjim commented 6 years ago

IEG

Indian Epigraphical Glossary uses only Latin alphabet with diacritics to represent Sanskrit words. Entries contain words from other Indian languages, also coded with Latin alphabet with diacritics. The text does not distinguish Sanskrit words by any typographical means.

The details of the IAST conventions for Sanskrit words are as follows, based upon empirical observations in the course of conversion to modern IAST spellings. Probably because of the 1966 publication date, the IAST conventions of the text are close to modern standards for Sanskrit words. The author decscribes his system of transliteration.

Several headwords are non-Sanskrit words; nonetheless, the digitization transcodes these to SLP1 so they will be comparable to other dictionaries with Sanskrit headwords.

The only certain difference from modern IAST for Sanskrit words is anusvara.

short vowels agreeing with modern IAST: a,i,u,ṛ. Also ḷ
Long vowels use macrons, agreeing with modern IAST.
Visarga is ḥ
anusvara is ṁ instead of modern ṃ
Diphthongs same as modern IAST: e, o, ai, au.
- Note: ĕ and ŏ are shown in the 'system of transliteration' as as short ; it is believed that these appear in non-Sanskrit words only.
gutturals k, kh, g, gh, ṅ
palatals c, ch , j, jh, ñ -- same as modern
cerebral consonants same as modern: ṭ, ṭh, ḍ, ḍh, ṇ
dentals agree with modern: t, th, d, dh, n
labials agree with modern: p, ph, b, bh, m
Semivowels are modern: y, r, l, v
Sibilants are modern ś ,ṣ and s.
h is h

funderburkjim commented 6 years ago

INM

Index to the Names in the Mahabharata uses only Latin alphabet with diacritics to represent Sanskrit words.

The headwords are alphabetized in accordance with Latin alphabet.

The details of the IAST conventions for Sanskrit words are as follows, based upon empirical observations in the course of conversion to modern IAST spellings.

The only differences from modern IAST conventions are, I think, in the sibilants, where ç and sh are used instead of ś and ṣ.

funderburkjim commented 6 years ago

KRM

KṚDANTARŪPAMĀLĀ uses only Devanagari, coded as SLP1, in the body of the text.

The preface material is also digitized, is in English, and contains some Sanskrit words represented in the Latin alphabet with diacritics. No definite analysis was made of the modernity of the IAST convention in this preface material.

funderburkjim commented 6 years ago

MCI

Mehendale Mahabharata Cultural Index uses Latin alphabet with diacritics to represent Sanskrit words (according to digitization, only two instances of Devanagari are present).

Based on empirical evidence gathered during the conversion from original AS coding to Unicode, the only divergence of the IAST system used in the text from the current modern standard is the use of ṁ for anusvara rather than the modern ṃ.

sanskrit-lexicon / COLOGNE