Closed funderburkjim closed 3 months ago
Just see what LIES has, with support from vast literary background (all the print texts use this Roman notation, incl. MW)
I just wanted to have this implemented in Cologne digitisations as well. And this would also automatically bring unison between MW and PWG renderings (when viewed in Devanagari) at CDSL that I have been talking about.
Having a different notation would only create unnecessary confusion among the readers/users.
Also the above LIES page (162) talks about the 3rd accent anudAtta clearly.
Thus one can clearly see that these three accents are rendered as
udAtta by
acute accent
, svarita bygrave accent
& anudAtta byunderscore
(if this accent is shown; else it is mostly ignored)
in all the Roman transliterated texts.
I would suggest that CDSL also adopts the same standard notations in its renderings.
Just for info- the Devanagari prints mostly ignore the udAtta sign (as against the anudAtta being ignored in Roman prints); it is Boethlingk's introduction to render this as a (combining diacritic) devanagari 'u'.
I thought I should spend sometime looking at the MW data, to understand why @funderburkjim is hesitating to make any changes in the accents portion.
And surprisingly, I have found that data is too much jumbled up, with quite many mistakes in accent marking, and far more missing marks. It could be cleaned up only by going through the full text again thoroughly (at least for the HWs portion).
Finally on top of it, the notations for udātta as Roman acute mark and svarita as Roman grave mark in the MW print have become acute and circumflex (caret) respectively in the CDSL text; as compared to the standard practice in the literary texts that "in Latin script transcription, udātta is marked with an acute accent, independent svarita is marked with a grave accent, and other syllables are unaccented, and not marked."
At the end in the CDSL MW Devanagari rendering, the udAtta (Roman acute) is represented as ◌॑ (Yajurvedic style of udātta marking), and the svarita (Roman circumflex) as ᳠ (less frequently used Kashmiri style of Rigvedic independent svarita), mixing up things further; as compared to the standard practice in the literary texts of following the Rigvedic convention that "anudātta is written with a bar below the line (◌॒), svarita with a stroke above the line (◌॑) while udātta is unmarked".
So, it definitely needs a strong determination to set things right, if at all occurs to mind.
P.S.
It took much effort in convincing Jim (exactly one year back!) and getting CDSL PWG accents rendering into the shape agreeing with the printed PWG dictionary as well as the vast supportive literary texts that it has cited.
Similarly, I can definitely suggest a simple & workable solution for CDSL MW, which if accepted clears things once for all; else the entire user community (unless vigilant enough) continues to get the confusion spread more and more, as the CDSL MW is supposed to be much widely used.
Would like to hear back from Jim on this. [Probably he could consult Peter Scharf and Thomas Malten for taking a final call on this.]
I also have a strong disinclination towards making any changes to accent data. Principal reason being, the accents are rendered differently based on what branch of Veda it is. Therefore, there is going to be a mix of accent data, based on which Veda branch the quotation is taken from.
I strongly disagree about this; MW has just took all the Vedic citations from PWG or pwk, and Boethlingk has been very attentive and consistent about using the accents in all his works.
Even MW99 print is consistent in having them; only CDSL MW data has them wrongly typed/missed and then wrongly interpreted the notation marks. However CDSL PWG typed data is far better so far as these marks are concerned.
I did not expect this much of typing errors in MW accents until today, when I decided to look closely about them.
Principal reason being, the accents are rendered differently based on what branch of Veda it is. Therefore, there is going to be a mix of accent data, based on which Veda branch the quotation is taken from.
@drdhaval2785 / @funderburkjim
I did some home work gathering information about this, and could conclude that this need not be a reason to worry that much!!
Almost all the Vedic texts follow the RV style. (udātta/Roman acute/Dev. no mark; svarita/Roman grave/Dev. vertical bar above; anudātta/Roman no mark/Dev. horiz. bar below.) And, the majority of citations fall under this category.
The SV (Sāmaveda) has a completely different style denoting the accents by numerals above the letters (udātta as 1; svarita as 2; anudātta as 3) [resp. in Roman or in Dev. scripts]; and it does not matter too much even if the RV style is applied to SV, as one can simply do a correlation in the mind, depending on the context.
The Krishna (Black) YV is the main conflicting one, having the udātta as vertical bar and svarita as either three vertical bars or a dot below or a breve below. And fortunately, not many citations from this are given in the dictionaries that we are talking about here. [Also as in the case of SV, one can contextually correlate these KYV accents too, as both udātta & svarita are not interchanged but differently marked.]
With this study summarized as above, I leave this topic now for others to deliberate upon.
Sorry that I missed stating the last point from my side--
I just asked Jim to mark the svarita accent as Roman grave, instead of Roman circumflex. [The reason being, to the best of my knowledge, there is no printed text having it marked thus.]
And the BEST solution as far as Dev. accent markings [with whatever differences that exist in various Vedic branches] are concerned, is to avoid (showing) them altogether, as done by MW and Macdonnell!!
It could be cleaned up only by going through the full text again thoroughly (at least for the HWs portion).
HW would be enough.
MW has just took all the Vedic citations from PWG or pwk, and Boethlingk has been very attentive and consistent about using the accents in all his works.
Agree
only CDSL MW data has them wrongly typed/missed and then wrongly interpreted the notation marks. However CDSL PWG typed data is far better so far as these marks are concerned.
Yes, you are starting to discuss with arguments. I like you @Andhrabharati
I did not expect this much of typing errors in MW accents until today, when I decided to look closely about them.
Like hundreds of mistakes?
[The reason being, to the best of my knowledge, there is no printed text having it marked thus.]
@drdhaval2785 why are you against? A mess is mess and if it's a digital mess, nothing to do with the book, why not clean it up?
Glad you found this. For me, this is very strong argument in favor of changing the IAST representation of svarita to grave accent.
I am sorry to say this, @funderburkjim --
LIES uses only ISO 15619 system when talking about Roman transliteration of Sanskrit, but not the IAST system.
You can clearly see this on pp.176-181, wrt the f and x in slp1 (where a circle instead of a dot is used under the roman letters).
Does this make you do a 'turn back'?
[IAST specified circumflex only for the svarita accent.]
The §1.4 Roman transliteration (pp. 16-18) of LIES may also be looked at.
I did not expect this much of typing errors in MW accents until today, when I decided to look closely about them.
Like hundreds of mistakes?
I estimate the count to be more than 10x times, @gasyoun !
I think you mean ISO 15919 (Wikipedia: https://en.wikipedia.org/wiki/ISO_15919).
There is a section here addressing differences with IAST. The wikipedia article on IAST.
AFAIK, neither standard discusses accent representation.
Thus, the fact that LIES prefers the 15919 standard does not have an impact on the current concern over representation of accents.
The current CDSL use of circumflex for svarita must be a choice made by me. My current understanding is that this choice was wrong, and needs to be changed (to grave accent). Similarly, the (rarer) anudAtta representation should use the 'macron below' diacritic.
In the dictionaries, all text X shown as <s>X</s>
is transcoded according to the user 'output' choice. In the digitizations, X is slp1, with accents represented by a trailing ASCII
/
^
\
The transcoding to IAST or Devanagari or HK is based on xml transcoding files, with file names like 'slp1_deva.xml', 'slp1_roman.xml` for iast,. For basic display and friends, these files are in csl-websanlexicon repository. For simple search (and related) displays, copies of the transcoder files are in csl-apidev repository.
There is also a transcoding version 'slp1_deva1.xml' used for Boehtlingk dictionaries, according to getwordClass.php, and similarly for Basic,etc. in csl-websanlexicon repository.
Thus, technically, changing the display of accents for IAST is accomplished by changes to file slp1_roman.xml.
Here are current transcodings for a-udAtta and f-udAtta (slp1 f = vocalic r).
<e> <s>SKT</s> <in>a/</in> <out>\u00e1</out> </e>
<e> <s>SKT</s> <in>f/</in> <out>\u1e5b\u0301</out> </e>
Note that for 'a', this converts 'a/' to preformed unicode 00e1 (LATIN SMALL LETTER A WITH ACUTE). But for vocalic 'r', it uses unicode 0301 (COMBINING ACUTE ACCENT) to apply acute accent to ṛ (Latin Small Letter R With Dot Below).
For current purposes, I suggest we use the COMBINING characters for transcoding of all accents to IAST diacritics.
CDSL PWG typed data is far better so far as these marks are concerned.
Could we use this observation as a programmatic assistant for improving the accuracy of CDSL MW accent markup?
@Andhrabharati If you want to pursue this, let's discuss further in a separate issue.
I think you mean ISO 15919
Yes, 15619 is a BAD typo from my side.
For current purposes, I suggest we use the COMBINING characters for transcoding of all accents to IAST diacritics.
* udAtta 0301 COMBINING ACUTE ACCENT * svarita 0300 COMBINING GRAVE ACCENT * anudAtta 0331 Combining Macron Below
This is exactly what I wanted it to be! Hope it materialises soon.
CDSL PWG typed data is far better so far as these marks are concerned.
Could we use this observation as a programmatic assistant for improving the accuracy of CDSL MW accent markup?
@Andhrabharati If you want to pursue this, let's discuss further in a separate issue.
@funderburkjim I have no great expectations from the programmatic approach for this, but still it could be one step in resolving the (wrong) accents. [I sure would like to see how you'd be processing it.]
(Andhrabharati)
[IAST specified circumflex only for the svarita accent.]
(Jim)
AFAIK, neither standard discusses accent representation.
Here are the relevant portions from the IAST recommendations, in the Report of the Transliteration Committe (1894)--
The current CDSL use of circumflex for svarita must be a choice made by me. My current understanding is that this choice was wrong, and needs to be changed (to grave accent). Similarly, the (rarer) anudAtta representation should use the 'macron below' diacritic.
Hurray.
Interesting that the 1895 document shows svarita-circumflex, anudAtta grave. I found a transcoder file slp1_romanpms.xml that comments attribute to pms (Peter Scharf). It is quite possible that Peter was aware of the 1895 accent comment, which pertains to IAST, and made his choice for 'roman' accordingly. And further, this may be the source of my choice.
Wonder if there is a comment about accents for ISO 15919 ? If so, does the comment agree with the LIES choice?
Then, the svarita-grave, anudAtta macron below
might then be described as IAST with 15919 accent extension.
Then, the
svarita-grave, anudAtta macron below
might then be described as IAST with 15919 accent extension.
This is the best possible 'intermediate' solution for the issue, using the IAST for normal letters and the ISO for the accents. I would be the happiest person to see this happen.
Wonder if there is a comment about accents for ISO 15919 ? If so, does the comment agree with the LIES choice?
There are references available scattered at many places on the web. And yes, the LIES clearly mentions its choice to be from the ISO standard.
The official ISO standard is not available publicly; one has to buy it. If Jim is particular to have a look at it, I can buy one (the Indian standard of it that costs just about 8 USD, against the ISO standard that costs 175 USD) and post the relevant portions (as done for the IAST above). [Both Indian (IS/ISO 15919:2001) and ISO (ISO 15919:2001) versions are the same texts letter-to-letter, except the title number!!]
The changes now made. For details, refer to the repository commit links above. The displays should now show revised accents. For display examples, see
Also, in a #137 comment, there is a revised IAST version of mw.
@Andhrabharati I guess it would be good to see the ISO comment on accent, if you can get it without too much trouble and cost.
In the meantime, closing this issue.
I had seen the change first at #137 and posted my response--
https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1251359197
@Andhrabharati I guess it would be good to see the ISO comment on accent, if you can get it without too much trouble and cost.
In the meantime, closing this issue.
Good that you wanted to see the ISO text also once!
Just purchased the document and here are the snippets from the same--
------------------------
------------------------
------------------------
------------------------
------------------------
The Rule 14 is a print error here; should've been Rule 13.
------------------------
------------------------
While the Rule 13 of Clause 8 recommends the grave mark to be placed at the last Roman vowel in the digraphs (ai & au), the clause 8.2 recommends the underscore mark to be placed under both the Roman vowels in the same digraphs. Surprising specification, indeed. Need to check if there is any text that has done so in practice. [The LIES has the underscore only below the last vowel, just like the grave placement.]
------------------------
The "Rules 14 and 15" is a print error here; should've been "Rules 13 and 14".
------------------------
And interestingly, MW has been a reference work in this (at the Bibliography)!
[Wonder why they referred its Indian print, while the London print also is widely available.]
------------------------
And as the olden days' children stories end, "we live happily ever after."
Could you find time to see these snippets from the ISO document (and my comments under them), @funderburkjim ?
@drdhaval2785
Any specific reason for your reopening this issue? [I am pretty sure that @funderburkjim might not've seen my ending posts; but as they are just informative, nothing to worry in having them in a 'closed post'!! I am not particular to see Jim responding to these here.]
Jim is happy to be excluded from the accent party!
There was some discussion (begun at this comment https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1236183769) regarding the representation of accents in MW, and also in Boehtlingk's works (Cologne dictionary codes PWG, PW, PWKVN).
The most specific suggestion was that vowels marked with a svarita accent in mw should be represented in IAST output with a 'grave accent' diacritic instead of the current representation with a circumflex diacritic. E.g., â should be replaced by à.
It was also noted that representation of accents in Devanagari output differs between MW and the PW dictionaries.
The question for this issue is whether to change the current representation of accents in Cologne displays for these dictionaries.