Accent representation in MW, PW, PWG, PWKVN

funderburkjim commented 1 year ago

There was some discussion (begun at this comment https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1236183769) regarding the representation of accents in MW, and also in Boehtlingk's works (Cologne dictionary codes PWG, PW, PWKVN).

The most specific suggestion was that vowels marked with a svarita accent in mw should be represented in IAST output with a 'grave accent' diacritic instead of the current representation with a circumflex diacritic. E.g., â should be replaced by à.

It was also noted that representation of accents in Devanagari output differs between MW and the PW dictionaries.

The question for this issue is whether to change the current representation of accents in Cologne displays for these dictionaries.

Andhrabharati commented 1 year ago

Just see what LIES has, with support from vast literary background (all the print texts use this Roman notation, incl. MW)

I just wanted to have this implemented in Cologne digitisations as well. And this would also automatically bring unison between MW and PWG renderings (when viewed in Devanagari) at CDSL that I have been talking about.

Having a different notation would only create unnecessary confusion among the readers/users.

Andhrabharati commented 1 year ago

Also the above LIES page (162) talks about the 3rd accent anudAtta clearly.

Thus one can clearly see that these three accents are rendered as

udAtta by acute accent, svarita by grave accent & anudAtta by underscore (if this accent is shown; else it is mostly ignored)

in all the Roman transliterated texts.

I would suggest that CDSL also adopts the same standard notations in its renderings.

Andhrabharati commented 1 year ago

Just for info- the Devanagari prints mostly ignore the udAtta sign (as against the anudAtta being ignored in Roman prints); it is Boethlingk's introduction to render this as a (combining diacritic) devanagari 'u'.

Andhrabharati commented 1 year ago

I thought I should spend sometime looking at the MW data, to understand why @funderburkjim is hesitating to make any changes in the accents portion.

And surprisingly, I have found that data is too much jumbled up, with quite many mistakes in accent marking, and far more missing marks. It could be cleaned up only by going through the full text again thoroughly (at least for the HWs portion).

Finally on top of it, the notations for udātta as Roman acute mark and svarita as Roman grave mark in the MW print have become acute and circumflex (caret) respectively in the CDSL text; as compared to the standard practice in the literary texts that "in Latin script transcription, udātta is marked with an acute accent, independent svarita is marked with a grave accent, and other syllables are unaccented, and not marked."

At the end in the CDSL MW Devanagari rendering, the udAtta (Roman acute) is represented as ◌॑ (Yajurvedic style of udātta marking), and the svarita (Roman circumflex) as ᳠ (less frequently used Kashmiri style of Rigvedic independent svarita), mixing up things further; as compared to the standard practice in the literary texts of following the Rigvedic convention that "anudātta is written with a bar below the line (◌॒), svarita with a stroke above the line (◌॑) while udātta is unmarked".

So, it definitely needs a strong determination to set things right, if at all occurs to mind.

P.S.

It took much effort in convincing Jim (exactly one year back!) and getting CDSL PWG accents rendering into the shape agreeing with the printed PWG dictionary as well as the vast supportive literary texts that it has cited.

Similarly, I can definitely suggest a simple & workable solution for CDSL MW, which if accepted clears things once for all; else the entire user community (unless vigilant enough) continues to get the confusion spread more and more, as the CDSL MW is supposed to be much widely used.

Would like to hear back from Jim on this. [Probably he could consult Peter Scharf and Thomas Malten for taking a final call on this.]

drdhaval2785 commented 1 year ago

I also have a strong disinclination towards making any changes to accent data. Principal reason being, the accents are rendered differently based on what branch of Veda it is. Therefore, there is going to be a mix of accent data, based on which Veda branch the quotation is taken from.

Andhrabharati commented 1 year ago

I strongly disagree about this; MW has just took all the Vedic citations from PWG or pwk, and Boethlingk has been very attentive and consistent about using the accents in all his works.

Even MW99 print is consistent in having them; only CDSL MW data has them wrongly typed/missed and then wrongly interpreted the notation marks. However CDSL PWG typed data is far better so far as these marks are concerned.

I did not expect this much of typing errors in MW accents until today, when I decided to look closely about them.

Andhrabharati commented 1 year ago

Principal reason being, the accents are rendered differently based on what branch of Veda it is. Therefore, there is going to be a mix of accent data, based on which Veda branch the quotation is taken from.

@drdhaval2785 / @funderburkjim

I did some home work gathering information about this, and could conclude that this need not be a reason to worry that much!!

Almost all the Vedic texts follow the RV style. (udātta/Roman acute/Dev. no mark; svarita/Roman grave/Dev. vertical bar above; anudātta/Roman no mark/Dev. horiz. bar below.) And, the majority of citations fall under this category.
The SV (Sāmaveda) has a completely different style denoting the accents by numerals above the letters (udātta as 1; svarita as 2; anudātta as 3) [resp. in Roman or in Dev. scripts]; and it does not matter too much even if the RV style is applied to SV, as one can simply do a correlation in the mind, depending on the context.
The Krishna (Black) YV is the main conflicting one, having the udātta as vertical bar and svarita as either three vertical bars or a dot below or a breve below. And fortunately, not many citations from this are given in the dictionaries that we are talking about here. [Also as in the case of SV, one can contextually correlate these KYV accents too, as both udātta & svarita are not interchanged but differently marked.]

With this study summarized as above, I leave this topic now for others to deliberate upon.

Andhrabharati commented 1 year ago

Sorry that I missed stating the last point from my side--

I just asked Jim to mark the svarita accent as Roman grave, instead of Roman circumflex. [The reason being, to the best of my knowledge, there is no printed text having it marked thus.]

And the BEST solution as far as Dev. accent markings [with whatever differences that exist in various Vedic branches] are concerned, is to avoid (showing) them altogether, as done by MW and Macdonnell!!

gasyoun commented 1 year ago

It could be cleaned up only by going through the full text again thoroughly (at least for the HWs portion).

HW would be enough.

MW has just took all the Vedic citations from PWG or pwk, and Boethlingk has been very attentive and consistent about using the accents in all his works.

Agree

only CDSL MW data has them wrongly typed/missed and then wrongly interpreted the notation marks. However CDSL PWG typed data is far better so far as these marks are concerned.

Yes, you are starting to discuss with arguments. I like you @Andhrabharati

I did not expect this much of typing errors in MW accents until today, when I decided to look closely about them.

Like hundreds of mistakes?

[The reason being, to the best of my knowledge, there is no printed text having it marked thus.]

@drdhaval2785 why are you against? A mess is mess and if it's a digital mess, nothing to do with the book, why not clean it up?

funderburkjim commented 1 year ago

LIES uses grave accent for svarita

Glad you found this. For me, this is very strong argument in favor of changing the IAST representation of svarita to grave accent.

Andhrabharati commented 1 year ago

I am sorry to say this, @funderburkjim --

LIES uses only ISO 15619 system when talking about Roman transliteration of Sanskrit, but not the IAST system.

You can clearly see this on pp.176-181, wrt the f and x in slp1 (where a circle instead of a dot is used under the roman letters).

Does this make you do a 'turn back'?

[IAST specified circumflex only for the svarita accent.]

Andhrabharati commented 1 year ago

The §1.4 Roman transliteration (pp. 16-18) of LIES may also be looked at.

Andhrabharati commented 1 year ago

I did not expect this much of typing errors in MW accents until today, when I decided to look closely about them.

Like hundreds of mistakes?

I estimate the count to be more than 10x times, @gasyoun !

funderburkjim commented 1 year ago

I think you mean ISO 15919 (Wikipedia: https://en.wikipedia.org/wiki/ISO_15919).

There is a section here addressing differences with IAST. The wikipedia article on IAST.

AFAIK, neither standard discusses accent representation.

Thus, the fact that LIES prefers the 15919 standard does not have an impact on the current concern over representation of accents.

The current CDSL use of circumflex for svarita must be a choice made by me. My current understanding is that this choice was wrong, and needs to be changed (to grave accent). Similarly, the (rarer) anudAtta representation should use the 'macron below' diacritic.

funderburkjim commented 1 year ago

combining or preformed?

In the dictionaries, all text X shown as <s>X</s> is transcoded according to the user 'output' choice. In the digitizations, X is slp1, with accents represented by a trailing ASCII

forward slash for udAtta /
circumflex for svarita ^
back slash for anudAtta \

The transcoding to IAST or Devanagari or HK is based on xml transcoding files, with file names like 'slp1_deva.xml', 'slp1_roman.xml` for iast,. For basic display and friends, these files are in csl-websanlexicon repository. For simple search (and related) displays, copies of the transcoder files are in csl-apidev repository.

There is also a transcoding version 'slp1_deva1.xml' used for Boehtlingk dictionaries, according to getwordClass.php, and similarly for Basic,etc. in csl-websanlexicon repository.

Thus, technically, changing the display of accents for IAST is accomplished by changes to file slp1_roman.xml.

Here are current transcodings for a-udAtta and f-udAtta (slp1 f = vocalic r).

<e> <s>SKT</s> <in>a/</in> <out>\u00e1</out> </e>
<e> <s>SKT</s> <in>f/</in> <out>\u1e5b\u0301</out> </e>

Note that for 'a', this converts 'a/' to preformed unicode 00e1 (LATIN SMALL LETTER A WITH ACUTE). But for vocalic 'r', it uses unicode 0301 (COMBINING ACUTE ACCENT) to apply acute accent to ṛ (Latin Small Letter R With Dot Below).

For current purposes, I suggest we use the COMBINING characters for transcoding of all accents to IAST diacritics.

udAtta 0301 COMBINING ACUTE ACCENT
svarita 0300 COMBINING GRAVE ACCENT
anudAtta 0331 Combining Macron Below

funderburkjim commented 1 year ago

CDSL PWG typed data is far better so far as these marks are concerned.

Could we use this observation as a programmatic assistant for improving the accuracy of CDSL MW accent markup?

@Andhrabharati If you want to pursue this, let's discuss further in a separate issue.

Andhrabharati commented 1 year ago

I think you mean ISO 15919

Yes, 15619 is a BAD typo from my side.

For current purposes, I suggest we use the COMBINING characters for transcoding of all accents to IAST diacritics.
* udAtta  0301  COMBINING ACUTE ACCENT
* svarita  0300  COMBINING GRAVE ACCENT
* anudAtta 0331 Combining Macron Below

This is exactly what I wanted it to be! Hope it materialises soon.

Andhrabharati commented 1 year ago

CDSL PWG typed data is far better so far as these marks are concerned.

Could we use this observation as a programmatic assistant for improving the accuracy of CDSL MW accent markup?

@Andhrabharati If you want to pursue this, let's discuss further in a separate issue.

@funderburkjim I have no great expectations from the programmatic approach for this, but still it could be one step in resolving the (wrong) accents. [I sure would like to see how you'd be processing it.]

Andhrabharati commented 1 year ago

(Andhrabharati)

[IAST specified circumflex only for the svarita accent.]

(Jim)

AFAIK, neither standard discusses accent representation.

Here are the relevant portions from the IAST recommendations, in the Report of the Transliteration Committe (1894)--

gasyoun commented 1 year ago

The current CDSL use of circumflex for svarita must be a choice made by me. My current understanding is that this choice was wrong, and needs to be changed (to grave accent). Similarly, the (rarer) anudAtta representation should use the 'macron below' diacritic.

Hurray.

funderburkjim commented 1 year ago

Interesting that the 1895 document shows svarita-circumflex, anudAtta grave. I found a transcoder file slp1_romanpms.xml that comments attribute to pms (Peter Scharf). It is quite possible that Peter was aware of the 1895 accent comment, which pertains to IAST, and made his choice for 'roman' accordingly. And further, this may be the source of my choice.

Wonder if there is a comment about accents for ISO 15919 ? If so, does the comment agree with the LIES choice?

Then, the svarita-grave, anudAtta macron below might then be described as IAST with 15919 accent extension.

Andhrabharati commented 1 year ago

Then, the svarita-grave, anudAtta macron below might then be described as IAST with 15919 accent extension.

This is the best possible 'intermediate' solution for the issue, using the IAST for normal letters and the ISO for the accents. I would be the happiest person to see this happen.

Wonder if there is a comment about accents for ISO 15919 ? If so, does the comment agree with the LIES choice?

There are references available scattered at many places on the web. And yes, the LIES clearly mentions its choice to be from the ISO standard.

The official ISO standard is not available publicly; one has to buy it. If Jim is particular to have a look at it, I can buy one (the Indian standard of it that costs just about 8 USD, against the ISO standard that costs 175 USD) and post the relevant portions (as done for the IAST above). [Both Indian (IS/ISO 15919:2001) and ISO (ISO 15919:2001) versions are the same texts letter-to-letter, except the title number!!]

funderburkjim commented 1 year ago

The changes now made. For details, refer to the repository commit links above. The displays should now show revised accents. For display examples, see

akzita in mw (udAtta)
akzitavya in mw (svarita)
tva in PW (anudAtta),

Also, in a #137 comment, there is a revised IAST version of mw.

funderburkjim commented 1 year ago

@Andhrabharati I guess it would be good to see the ISO comment on accent, if you can get it without too much trouble and cost.

In the meantime, closing this issue.

Andhrabharati commented 1 year ago

I had seen the change first at #137 and posted my response--

https://github.com/sanskrit-lexicon/MWS/issues/137#issuecomment-1251359197

Andhrabharati commented 1 year ago

@Andhrabharati I guess it would be good to see the ISO comment on accent, if you can get it without too much trouble and cost.

In the meantime, closing this issue.

Good that you wanted to see the ISO text also once!

Just purchased the document and here are the snippets from the same--

------------------------ ------------------------ ------------------------ ------------------------ ------------------------ The Rule 14 is a print error here; should've been Rule 13. ------------------------ ------------------------

While the Rule 13 of Clause 8 recommends the grave mark to be placed at the last Roman vowel in the digraphs (ai & au), the clause 8.2 recommends the underscore mark to be placed under both the Roman vowels in the same digraphs. Surprising specification, indeed. Need to check if there is any text that has done so in practice. [The LIES has the underscore only below the last vowel, just like the grave placement.]

------------------------ The "Rules 14 and 15" is a print error here; should've been "Rules 13 and 14".

------------------------ And interestingly, MW has been a reference work in this (at the Bibliography)! [Wonder why they referred its Indian print, while the London print also is widely available.]

------------------------

Andhrabharati commented 1 year ago

And as the olden days' children stories end, "we live happily ever after."

Andhrabharati commented 1 year ago

Could you find time to see these snippets from the ISO document (and my comments under them), @funderburkjim ?

Andhrabharati commented 3 months ago

@drdhaval2785

Any specific reason for your reopening this issue? [I am pretty sure that @funderburkjim might not've seen my ending posts; but as they are just informative, nothing to worry in having them in a 'closed post'!! I am not particular to see Jim responding to these here.]

funderburkjim commented 3 months ago

Jim is happy to be excluded from the accent party!

sanskrit-lexicon / MWS

Accent representation in MW, PW, PWG, PWKVN #140

LIES uses grave accent for svarita

combining or preformed?