Closed funderburkjim closed 3 years ago
Note: In the process of corrections to MW, we have introduced a small number of 'grave accents' à. Based on the discussion above, these should be changed to circumflex: â
<L>23899<pc>137,1<k1>ādya<k2>ādyá<e>2B
<s>ādyá</s> ¦ (for <hom>2.</hom> <s>ādyà</s> See <ab>s.v.</ab>)
<LEND>
<L>176844<pc>875,1<k1>rājanya<k2>rājanyà<e>2
<s>rājanyà</s> ¦ <lex>mf(<s>ā̀</s>)n.</lex> kingly, princely, royal, <ls>RV.</ls> &c. &c.<info lex="m:f#A:n"/>
<LEND>
And I am doing so in next revision of mw.
Agreed that slp1 for à is â.
Now my point changes thus-
why are those slp1 characters remained in <k2>
and <s>
strings?
They should have been converted to (proposed) IAST, isn't it?
Supposing that â is the proposed IAST form, why wasn't the à chosen instead?
No, slp1 for à
is a^
The iast for a^
is â
.
How does Katre represent a-svarita ?
Anyway, I should leave it to you, to decide and continue further.
[All such differences would have to be noted separately for our (AB) use.]
How does Katre represent a-svarita ?
Just like the MW print.
In fact all the books that I've seen are having it only thus.
Can you generate a+macron+combining-circumflex?
Also, does Katre represent a-anudatta? If so, is it different from his a-svarita?
PFA the page from Katre-
We can generate any combination of diacs using this link-
http://titus.uni-frankfurt.de/unicode/unicsel/unicself.htm
I am using this for the Greek and other scripts as well.
E40D LATIN SMALL LETTER A WITH MACRON AND CIRCUMFLEX ABOVE 0061 + 0304 + 0302 ā̂
Good link. Thanks: ā̂
As I know the anudātta is marked with ◌॒ (Unicode: U+0952).
I need to see if Katre has used it anywhere.
This is what Katre has in his Pāṇini’s Aṣṭādhyāyi-
Nothing mentioned about this in his Dictionary of Pāṇini.
alphabet_accent is a reference for the current correspondence between slp1 and IAST.
@Andhrabharati Note especially the vowel+diacritics -- If you cut and paste from these, it will simplify my work in transcoding back to slp1.
About these accents, now I would like to bring to your (@funderburkjim) notice the following-
Peter & Malcom's Linguistic Issues in Encoding Sanskrit [https://sanskritlibrary.org/Sanskrit/pub/lies_sl.pdf] says thus (pp. 16-17)-
... Of particular importance as regards standardization of the schemes used by European scholars was the Geneva Oriental Congress of 1894 (Wujastyk, 1996). Contemporary schemes for Romanizing Sanskrit are quite similar to those employed in the nineteenth century and are characterized by the following conventions: ...
--------------------------
And in the App. C shows all these x^ (slp1) as x̀ (Roman)
Hope with this, @funderburkjim would now think of changing the accents as mentioned by the creators of slp1 themselves, which is the way I was using in all my remarks/comments.
--------------------------
Note: The Geneva Oriental Congress of 1894, is where the IAST has took its birth.
[Lesson. Wiki has made "easy access" to many articles and much info from many corners of the world; but in too many cases, one has to cross-check them instead of taking them "for granted".]
And now the summary of these accents as seen in the mw_iast file posted by Jim today.
|a^|â|LATIN SMALL LETTER A WITH CIRCUMFLEX| count: 4303
|a\
|à|LATIN SMALL LETTER A WITH GRAVE| count: 10
|i^|î|LATIN SMALL LETTER I WITH CIRCUMFLEX| count: 141
|i\
|ì|LATIN SMALL LETTER I WITH GRAVE| count: 0
|u^|û|LATIN SMALL LETTER U WITH CIRCUMFLEX| count: 31
|u\
|ù|LATIN SMALL LETTER U WITH GRAVE| count: 1
|f^|ṛ̂|LATIN SMALL LETTER R WITH DOT BELOW + COMBINING CIRCUMFLEX ACCENT| count: 0
|f\
|ṛ̀|LATIN SMALL LETTER R WITH DOT BELOW + COMBINING GRAVE ACCENT| count: 4
|A^|ā̂|LATIN SMALL LETTER A WITH MACRON + COMBINING CIRCUMFLEX ACCENT| count: 3
|A\
|ā̀|LATIN SMALL LETTER A WITH MACRON + COMBINING GRAVE ACCENT| count: 0
|e^|ê|LATIN SMALL LETTER E WITH CIRCUMFLEX| count: 326
|e\
|è|LATIN SMALL LETTER E WITH GRAVE| count: 0
|o^|ô|LATIN SMALL LETTER O WITH CIRCUMFLEX| count: 226
|o\
|ò|LATIN SMALL LETTER O WITH GRAVE| count: 0
[Probably some of these could be in non-<s>
strings, like <etym>
etc.]
Note: The Geneva Oriental Congress of 1894, is where the IAST has took its birth.
IAST we used and initial 1894 IAST is not equal, still close.
There were a few vowel-grave instances in mw_iast.txt; I've changed these to vowel-circumflex in local version. There are now about 200 instances of svarita accent (represented with vowel-circumflex) in mw_iast.txt, occurring in
<s>
tag.The rest of the vowel-circumflex AB notes above occur in
<ls>
tags -- these are inherent IAST (not converted to-from slp1); they are believed to
be instances of MW's vowel-sandhi usage of circumflex.<s1 slp1=".*?">[^<]*â
(text within <s1>
tags - again representing vowel-sandhi<etym>
tags (small number; representation of other languages)The comments from Peter and Malcolm's book are helpful.
In addition to supporting the vowel-grave IAST representation of svarita accents in MW, notice that there is no distinctive IAST representation for anudatta accents.
Thus, if we apply that algorithm for representing Sanskrit in IAST to a text which has anudatta accent, then we cannot retrieve the original accented text from its IAST form. To my way of thinking, this is a weakness of IAST representation.
In the case of MW, we assume that there are no anudatta accents. With this assumption, we could construct the IAST version using grave-accent to represent Sanskrit svarita accent; and because there are no anudatta accents, reconstruct accurately the slp1 text of mw.txt from mw_iast.txt.
Given the paucity of instances (200+) of svarita accents under discussion in MW, the manner of IAST representation of svarita in mw_iast.txt does not affect much.
If AB insists, I can change the transcoding of mw_iast.txt so that the slp1-svarita circumflexes are represented in mw_iast.txt by grave accents rather than the current circumflex accents.
I would be more than glad to see that happen. (And also update the alphabet_accent file.)
And probably we can think of "extending" the Cologne version of IAST by using the unicode character I've mentioned as above.
As I know the anudātta is marked with ◌॒ (Unicode: U+0952).
This is what the printed books (that I've seen) use, though it is not in IAST.
In the absence of "normative standards", one can follow the "industry standards"!!
I would be more than glad to see that happen
OK. Will aim for that.
Will do this after incorporating your next 'check the updated work all over again' step in #101.
I missed Jim's reference to Whitney's Grammar above.
Just like to say that he apprently has "stopped" at the beginning of the article 83; should've gone a little further to (a) in there!!
Agree that Peter's description consistent with Whitney. Whitney also doesn't mention anudAtta.
alphabet_accent1.md has a suggested revision of slp1-iast transcoding. The differences from alphabet_accent.md are:
Assuming we agree on this revised IAST correspondence to SLP1, I'll use it for the next revision of mwtranscode/mw_iast.txt.
we agree on this revised IAST correspondence to SLP1
We sure do.
yes; U0332 is a proper choice, as the point is about having the diac for Roman letters and not for Devanagari.
and this could be applied to all cologne digitisations, not just MW99.
@funderburkjim
Just noticed this towards the end of both the accent files you posted-
|\||łh|LATIN SMALL LETTER L WITH STROKE + LATIN SMALL LETTER H|
is this so from the beginning? -if so, how is | present in the mw files?
or just modified now? -so the issue discussed in MWS #88 would/should be resolved by this. -and this would make it a Cologne version of slp1 now!!
I would now request you to consider changing L and this \ to indicate ḷ and ḷh (or l̥ and l̥h), instead of ł and łh.
Many transliteration softwares are designed to produce ḷ or l̥, and any difference would require a special/extra effort to key-in those characters, just for Cologne data.
We should travel the way many people do, unless there is a pressing need to do otherwise.
This small document talking about Vedic accents may be of some interest to go through once.
Found another document exclusively talking about Skt. dictionaries and accents.
Rau2017_vedic-accent-in-lexicography.pdf
Is the present Sankrit-lexicon team having any links with this Lazarus project?
And is there a way for accessing this github.com/sanskrit-lexicon/MWS/files/... folder? Looks it might contain many interesting/informative documents like this. Or is that a "Private area"?
Is the present Sankrit-lexicon team having any links with this Lazarus project?
Very minor, still there has been some contact in the past with @fxru
In SLP1, 'L' represents consonant ळ (Unicode Devanagari LLA).
In SLP1, '|' represents conjunct consonant ळ्ह
The above are my understanding.
IAST representation of these are not found in https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration.
However, this source does mention another 'standard' ISO 15919. And mentions that l̥ is used in ISO 15919 to represent 'vocalic l' (= slp1 'x')
l̥ = LATIN SMALL LETTER l + 'COMBINING RING BELOW' (U+0325)
By contrast, IAST uses ḷ = LATIN SMALL LETTER L WITH DOT BELOW
And the same article mentions that ISO 15919 uses the same ḷ (= LATIN SMALL LETTER L WITH DOT BELOW) to represent Devanagari ळ.
At the start of our work with @Andhrabharati , it was decided to make an IAST version of mw_iast. txt for him. And since it was necessary to be able to convert between the IAST version and the 'native' SLP1 version mw.txt, I had to develop some unambiguous code for the IAST representations of slp1 'L' and '|'.
I chose 'ł' and 'łh' to be IAST representations of slp1 'L' and '|'.
Our Cologne software handles the conversions.
Since there is no standard, these choices are as good as any.
If a standard ever emerges in the future, we can revise.
github.com/sanskrit-lexicon/MWS/files/ folder
This url is not available.
It may be that when you drag a file into a comment, Github puts it into this url.
For example in #83, @Andhrabharati dragged a file 'changes_0_Andhrabharati.txt' into a comment. and this file is now available (for download only) as url: https://github.com/sanskrit-lexicon/MWS/files/5750131/changes_0_Andhrabharati.txt
So there is no 'files' directory per se. It's just a convention of Github. We are making use of no 'Private areas'. Everything out in the open for sanskrit-lexicon.
I guess, this issue got enough (and necessary) attention and is discussed upon, and can be closed now.
This issue in response to comment/question from AB in his 'missed' document.
Here's the comment:
The â in k2 is actually not an accident. Here's the reason as I understand it, although my understanding of accents in general is extremely shallow; I am paraphrasing what I understood Peter Scharf to have meant. In MW, there are two kinds of printed accents -- acute and grave. The acute printed accent corresponds to Sanskrit accent type udAtta, and is by far the more common in MW. The grave printed accent corresponds (as I understood it) to Sanskrit accent type 'svarita' (not anudAtta).
In SLP1 transliteration, the three accents are represented by:
Thus, the 'a^' in the SLP1 spellings of 'k2' represent a+svarita, which is, according to Peter as I understood him, the correct interpretation of what MW represents as a+grave.
Note that this seems consistent with the usage described in Whitney Grammar Section 83
The further wrinkle regards the correspondence between SLP1 and IAST. We (I) take the authority for IAST representation of Devanagari to be https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration. In this source, there is no mention of accents. In the Cologne site, we have chosen to represent IAST accents by
To summarize the svarita accent case in MW:
a^
)â
).