Closed funderburkjim closed 5 years ago
Sanskrit words are coded exclusively using a version of IAST (Latin alphabet with diacritics).
The version of IAST is the same as modern IAST, with one exception: ç is used instead of ś.
The current Cologne digitization removes this exception.
Accents are not used in Sanskrit words.
There are a small number of cases where a 'short-long' vowel is indicated in the text:
The digitization currently represents these as ī¤ (i.e. the special character ¤ is appended to the long vowel.)
Usually, Sanskrit text is identified as being either in bold text (as the citation form of an entry) or
in italic text.
Regular text is used for French words. However there are numerous instances of Sanskrit words
appearing in regular text. e.g.
As the example suggests, these spellings also follow standard IAST (with the one exception mentioned above).
With STC as one example, does anyone have comments regarding the material mentioned in the summary?
The digitization currently represents these as ī¤
We can have it as a combined Unicode character as well, no need to leave it as it was.
comments regarding the material mentioned in the summary?
No, but I would describe it in a tabular form, I guess or make a checklist, otherwise, too many variations will be hard to observe.
Cambria and Charis SIL
Are those fonts? If so, they certainly represent the short/long in a nice way.
... tabular form ...
Let me do a couple more 'free form', then maybe a tabular pattern will suggest itself.
The conventions for the printed IAST of MW are mostly described in The dictionary order of the nagari letters.
While most of the printed conventions agree with modern IAST, there are a few differences. The transcoder file (iast_iast1.xml) summarizes the differences from the printed table:
<e> <s>INIT</s> <in>ṡ</in> <out>ś</out> </e>
<e> <s>INIT</s> <in>Ṡ</in> <out>Ś</out> </e>
<e> <s>INIT</s> <in>sh</in> <out>ṣ</out> </e>
<e> <s>INIT</s> <in>Sh</in> <out>Ṣ</out> </e>
<e> <s>INIT</s> <in>ḷ</in> <out>ḻ</out> </e> <!-- only Iḷā -> -->
<e> <s>INIT</s> <in>ṛi</in> <out>ṛ</out> </e>
<e> <s>INIT</s> <in>Ṛi</in> <out>Ṛ</out> </e>
<!-- There are no instances of these conversions
in Cologne digitization of MW99 -->
<e> <s>INIT</s> <in>ṛī</in> <out>ṝ</out> </e>
<e> <s>INIT</s> <in>lṛi</in> <out>ḷ</out> </e>
<e> <s>INIT</s> <in>lṛī</in> <out>ḹ</out> </e>
h
or
a sibilant sśṣ
. But the rule for usage of ṉ is more complicated, as there are many instances
of ṃ
before hsśṣ
.
ṉ
instances are non-standard IAST.ṉ-ṃ
distinction:<H1>100{haMsa}1{haMsa4}
-- haMsa4
would have been coded as han6sa4
if the
distinction had been made.,â Â î û ê ô
are retained; their modern IAST equivalents are ā Ā ī ū e o
<srs/>
tagIn Sanskrit text coded as slp1 within the <s>
tag, the circumflexion of vowels is indicated by an <srs/>
tag following the vowel. For instance, <s>gaRe<srs/>Sa</s>
.
In current displays, this <srs/>
tag is ignored, even when the user has requested IAST output for Sanskrit text.
It is likely possible to alter the transcoding rules for MW to display <s>gaRe<srs/>Sa</s>
as gaṇêśa
(with the circumflex) when IAST output is generated.
<shortlong/>
tagIn Sanskrit text coded as slp1 within the <s>
tag, the <shortlong/>
tag after a vowel indicates that
the vowel may be either short or long. There are 208 such instances.
As with <srs/>
, this tag plays no role in displays but could be converted in IAST display output to
display the vowel+macron+combining-breve.
In Sanskrit words coded as IAST, this vowel+macron+combining-breve representation is used; there are very few instances of this.
In Sanskrit text coded as slp1 within the <s>
tag, an accented vowel is indicated by a character following the vowel (/
for udatta accent,\
for anudatta, and ^
for svarita).
The transcoding scheme represents these in appropriate ways when (a) Devanagari or IAST output is requested, and (b) display controls have requested that accents be shown. Otherwise, the accents are ignored in output.
If so, they certainly represent the short/long in a nice way.
Yes, and a very good one https://software.sil.org/charis/
'free form', then maybe a tabular pattern will suggest itself.
Sure.
coding of headwords in Malten's original does not retain the ṉ-ṃ distinction
So the distinction is inside articles, but not in headwords?
could be converted in IAST display output to display the vowel+macron+combining-breve.
Possible in Charis.
the distinction is inside articles, but not in headwords?
Yes - that's a pretty accurate summary.
We can have it as a combined Unicode character as well, no need to leave it as it was.
This has now been done. For example, hw=hrdIka
with one exception: ç is used instead of ś. The current Cologne digitization removes this exception.
@sanskritisampada and I discovered that there were still many ç in Sanskrit words. We think that at least most of these have today been replaced with ś.
This has now been done.
Looks good.
Burnouf represents Sanskrit headwords in Devanagari, italicized Sanskrit in his own brand of IAST, and non-italic Sanskrit (proper names) in this same brand of IAST. This snip illustrates the three categories.
In the current status of the Burnouf digitization, the Devanagari has been converted to SLP1 transliteration ({#SLP1#}
in bur.txt, and <s>SLP1</s>
in bur.xml). The italic Sanskrit has been
converted to modern IAST. However, the non-italic Sanskrit proper names have not been converted
to modern IAST; with @sanskritisampada 's help to identify the non-italic Sanskrit words, these will
also soon be converted to modern IAST.
The print conventions are described in the Tableau de transcription; there are a few variances from this description in the actual text.
ḷi
is used instead of ḷ
.ẏ
instead of ṅ; other 3 are same as modern: ṇ , n , ms
.
ṇ
Also, 'x' for modern conjunct kṣ
:
That's Burnouf's system in summary, as it appears to me as of this writing.
few variances from this description in the actual text.
Well done, as usual.
Burnouf's system in summary
It's a mix somehow similar to what we see nowadays in India - all systems involved at once, in a single word even sometimes.
Monier-Williams provides a table of nāgarī letters with indo-romanic equivalents. He also writes an opinionated multi-page treatise Alphabet and System of Transliteration .
In conjunction with previous work on converting MW72 conventions to modern IAST in the Cologne digitizations, several issue comments have dealt with this dictionary's conventions:
*upalā*
'Upalā
.
In the current MW72 digitization , all the text identified as being in indo-romanic spelling has been converted to modern IAST.
sh
for cerebral sibilant, vs. modern ṣ.does not recognize these minor headwords as headwords
But not so in MW, right?
Perhaps should use °
Agree.
Grassman dictionary uses a version of IAST, with accents, to represent Sanskrit words (as well as cognate words in other languages). The details regarding his brand of IAST and its conversion to modern IAST in the Cologne digitizations is well-described in #199, specifically in this link and this link; no further comments needed here.
Sanskrit words are generally shown in Devanagari (coded as SLP1 in the digitization); accents are used.
However, as discussed in #195, the printed text uses Latin letters with diacritics in
<is>X</is>
in digitization).A summary comparing PWG's IAST and modern IAST:
ḷ
vowel unused, either short or longj
is used for modern semivowel 'y'.ḷ
is used for the Vedic consonant. (3 times Iḷâ, one time Aiḷa)There are also differences in the way the PWG text represents Devanagari accents, in comparison to the Unicode Vedic extensions. See this documentation of PWG accents.
The description above for PWG is applicable to the representation of Sanskrit words in PW.
The only variance I noticed was that there are no instances in PW of ḷ
for the Vedic consonant.
We've tackled the peculiarities of IAST in PW previously:
There are about 4500 words coded as <is>X</is>
in the digitization; X is supposed to be modern IAST spelling of a Sanskrit word. There are some errors that could be corrected (e.g. Maṅguśrī
-> Mañjuśrī
). The list of error candidates could be reduced by eliminating the words which are Sanskrit headwords in, say, MW (e.g. Āṅgirasa). This would be a good task for someone to undertake.
4500 words coded as
X
And no smart way to lessen the list, right?
Wrong - there is a smart way to lessen the list. We can lessen the list by removing cases like Āṅgirasa, which is an MW headword. When someone commits to working on this task, I can lessen the list in this way. Also, a list of links to PW instances of each remaining word can be generated, to make the lookup process more efficient.
Ok. After a long time, I guess I will start working on this. Let me know the modalities.
@drdhaval2785 Good news!
Which part are you interested in at the moment?
pw corrections offloaded to https://github.com/sanskrit-lexicon/CORRECTIONS/issues/419
Sanskrit headwords appear in both Devanagari and IAST forms. Within the body of the text, both forms also appear. Accents may be present in either Devanagari or IAST forms.
From #203, the only variance from what we consider standard IAST was that the anusvara of the printed text uses ṁ (m with dot above) whereas modern IAST uses ṃ (m with dot below). The digitization uses the modern form, ṃ.
There are a few unusual features appearing in the Devanagari spellings of the text, according to the comments of #203.
The Edgerton Buddhist Hybrid Sanskrit Dictionary represents Sanskrit words (as well as words in Pali, and other languages) with Latin alphabet with diacritics; there is no Devanagari. The diacritics for Sanskrit words agrees with what we are taking as modern IAST (refer #201).
Wilson uses both Devanagari and his own brand of IAST to represent Sanskrit words. WIL IAST conventions
Based on work done in conversion of Cologne digitization from AS to modern IAST; no known description of Wilson's IAST system by the author. There is also difficulty in interpreting the scanned images due to printing quality.
ḷ
vowel unused, rare - The Cologne digitization is based on the 1832 edition. @SergeA I recall that you referenced a different edition (1819?) in some prior work, but could not find the pdf -- do you recall the link to this other edition?
http://reader.digitale-sammlungen.de/en/fs1/object/display/bsb10932200_00005.html http://reader.digitale-sammlungen.de/en/fs1/object/display/bsb10495525_00005.html Wilson 1st ed. 1819 (click PDF-Download > Ja/Yes; 4-digit pin-code ; Weiter/Go > after few time the link for the file will appear)
the long diphthongs are are ai, ao (same as modern)
Did you mean au ?
Did you mean au ?
Yep! corrected.
Download working on WIL. Thanks!
Yates mostly uses Devanagari to represent Sanskrit words; but there are many Sanskrit words (mostly proper names) appearing within entries in Yates' version of IAST. As with Wilson, there is no explanation from the author of his IAST system, so the following summary is based on observations made during the conversion of the diacritic spelling from the AS (letter-number) coding to modern IAST in unicode encoding.
The AS coding appears in only 628 words (quite a small number).
The conversion work done thus far only addressed words originally coded as having a diacritic expressed in the AS form.
ṣh
instead of ṣ
. These also should be corrected.Page 59 of the 1819 edition has a table describing Wilson's IAST system, which he describes as 'following the system of Sir Wm. Jones' .
The empirical summary of the above comment seems to agree in most respects, though I see a few differences. Assuming the 1832 edition upon which the Cologne digitization was based uses the same IAST system (a reasonable assumption), this table from 1819 edition could be of interest if someone wanted to more fully investigate the original printed form.
Incidentally, the entire 50+ pages of the preface of the 1819 edition looks interesting.
SHS seems to be just a copy (mostly word for word) of WIL. So all the comments about Sanskrit word representation in Wilson are applicable to Shabda Sagara.
As with YAT, there are numerous IAST corrections which could be made to SHS -- the printing seems to be casual in applying rules for diacritics in such words.
I compared two entries (guru, and Gawa) as they appear in WIL and SHS. And they were almost identical, down to the last period or comma. This even though SHS shows a publication date of 1900, and WIL (2nd ed.) of 1832.
Are there some improvements that SHS brings to WIL ?
Goldstücker's dictionary is also an extension of Wilson's dictionary, although containing only 6000+ headwords through aByAhita. It generally follows Wilson's conventions for representing Sanskrit words, both in Devanagari and in Wilson's version of IAST.
Based on the empirical evidence of the conversion from GST's IAST to modern IAST, Goldstücker's IAST conventions are:
These two dictionaries use only Devanagari to represent Sanskrit words.
Benfey's dictionary displays headwords in both Devanagari and a version of IAST. Most of the Sanskrit words within entries are in his IAST, but in entries for roots, verbal prefixes appear in Devanagari.
No explanation of the IAST conventions used in Benfey have been found in the printed text. The following summary was developed empirically during the process of converting Benfey's IAST to agree with modern IAST.
Bhartṛ.
abbreviation
for Bhartṛihari
.
ḷi
is used for vocalic ḷ
in kḷip
; the long form doesn't appear.ḷ
also used for the Vedic consonant ळ.Generally Sanskrit words are presented in Devanagari within Bopp Glossarium Sanscritum; this includes headwords.
However, some words appearing in Latin alphabet have letters with diacritics. As mentioned in #202, no attempt has been made to 'modernize' the spelling of such words; the main reason is that they are likely Latinate forms of Sanskrit words, rather than Sanskrit words spelled with Latin alphabet.
Shiva, Vishnu, Krishna. A modest improvement to the Yates digitization would be the identification of such Sanskrit words and then their conversion to modern IAST.
Let it remain as such.
SHS seems to be just a copy (mostly word for word) of WIL.
Now that's an interesting fact.
Are there some improvements that SHS brings to WIL ?
Hard to say, nobody in India uses it. Wilson is checked as the very first one rarely, but SHS not even seldom.
Goldstücker's dictionary is also an extension of Wilson's dictionary
Right, such it was intended. Same as the book I've reprinted https://www.ozon.ru/context/detail/id/140949762/
handle ṛî (r2i10) properly; this needs to be fixed.
Easy to fix?
Latinate forms of Sanskrit words
So let them be. Bopp is interesting as MW takes his etymologies from Bopp.
the book at 'www.ozon.ru'
Using Google translate, here is the first sentence of the description in English:
The Sanskrit dictionary compiled by G.G.Vilson at the beginning of the XIX century (1832) was based
not on the processing of the texts themselves, but on the use of medieval autochthonous manuals.
This sounds interesting, but I don't understand the part translated as but on the use of medieval autochthonous manuals
.
Can you elaborate?
The printed text uses Devanagari for the headwords. A version of IAST is used within entries to write the Sanskrit names of works and authors. This IAST system is shown in the preface and repeated here:
a ā i ī u ū
ṛi ṛī e ai o au
k kh g gh n̄
c ch j jh ñ
ṭ ṭh ḍ ḍh ṇ
t th d dh n
p ph b bh m
y r l v
ç sh s h
It is quite close to the modern IAST conventions. The differences are:
Sanskrit words are almost always represented in Devanagari. Based on the few instances where Latin alphabet with diacritics represent Sanskrit words, the system differs from modern IAST as follows:
s
These instances have been only partially converted to modern IAST thus far. Since there are so few occurrences, finishing the conversion to modern IAST is not a particular concern.
Apte's Practical Sanskrit-English dictionary ('57) uses both Devanagari and a version of IAST to represent Sanskrit words. I find no table listing the IAST conventions, but empirical evidence suggests that the text's IAST is almost the same as what we are calling modern IAST. The only variances I notice are:
Kṛṣṇa
.
Ṛv
, there is no 'i' in the text.ṁ
, rather than modern ṃ.Apte's Practical Sanskrit-English dictionary (1890) uses both Devanagari and a version of IAST to represent Sanskrit words. The version of IAST is quite peculiar, and in fact uses two different systems. The only relevant comment I've found in the front matter is
One can see both systems in this snip from the top of page 2 (the first letter of Adiparvan is italicized, which corresponds to the fact that the letter is vowel is long; two other instances of long-a are seen with the circumflex.)
Here is the IAST system as it exists in the text; this explanation is derived from empirical observation, aided by the careful (AS) coding of Thomas' original digitization. All of these variations have been converted (I think!) to modern IAST in the current version of the digitization.
Can you elaborate?
Wilson made a dictionary that was not based on what he read or found, but what Amarakosha found.
uses two different systems
Worst case ever.
careful (AS) coding of Thomas' original digitization.
How much work and love he put in it!
How much work and love Thomas put in it! 👍
Borooah English-Sanskrit Dictionary primarily uses Devanagari to represent Sanskrit words, but uses a version of IAST to represent Sanskrit words in about 500+ lines of the text.
The details of the IAST conventions are as follows, based upon empirical observations in the course of conversion to modern IAST spellings:
n
for ṅ
n
)Cappeller Sanskrit-English Dictionary primarily uses Devanagari to represent Sanskrit words, but uses a version of IAST to represent Sanskrit words in about 1500+ lines of the text. Devanagari uses udatta and svarita accents.
The details of the IAST conventions are as follows, based upon empirical observations in the course of conversion to modern IAST spellings:
Cappeller Sanskrit Wörterbuch primarily uses Devanagari to represent Sanskrit words, but uses a version of IAST to represent Sanskrit words in about 1500+ lines of the text. Devanagari uses udatta and svarita accents.
The details of the IAST conventions are as follows, based upon empirical observations in the course of conversion to modern IAST spellings:
Indian Epigraphical Glossary uses only Latin alphabet with diacritics to represent Sanskrit words. Entries contain words from other Indian languages, also coded with Latin alphabet with diacritics. The text does not distinguish Sanskrit words by any typographical means.
The details of the IAST conventions for Sanskrit words are as follows, based upon empirical observations in the course of conversion to modern IAST spellings. Probably because of the 1966 publication date, the IAST conventions of the text are close to modern standards for Sanskrit words. The author decscribes his system of transliteration.
Several headwords are non-Sanskrit words; nonetheless, the digitization transcodes these to SLP1 so they will be comparable to other dictionaries with Sanskrit headwords.
The only certain difference from modern IAST for Sanskrit words is anusvara.
Index to the Names in the Mahabharata uses only Latin alphabet with diacritics to represent Sanskrit words.
The headwords are alphabetized in accordance with Latin alphabet.
The details of the IAST conventions for Sanskrit words are as follows, based upon empirical observations in the course of conversion to modern IAST spellings.
The only differences from modern IAST conventions are, I think, in the
sibilants, where ç
and sh
are used instead of ś and ṣ.
KṚDANTARŪPAMĀLĀ uses only Devanagari, coded as SLP1, in the body of the text.
The preface material is also digitized, is in English, and contains some Sanskrit words represented in the Latin alphabet with diacritics. No definite analysis was made of the modernity of the IAST convention in this preface material.
Mehendale Mahabharata Cultural Index uses Latin alphabet with diacritics to represent Sanskrit words (according to digitization, only two instances of Devanagari are present).
Based on empirical evidence gathered during the conversion from original
AS coding to Unicode, the only divergence of the IAST system used in the
text from the current modern standard is the use of ṁ
for anusvara rather
than the modern ṃ
.
The purpose of this issue is to document the conventions used in various dictionaries to represent Sanskrit words.
Exactly how to construct such a documentation is unclear at the moment. So the approach will be free form, aiming to get the relevant facts mentioned. At some later time the facts gathered here may be reformed into a more useful reference.