The accent saga continues

funderburkjim commented 1 week ago

Like the fabled phoenix rising from the ashes, representation of accents in Sanskrit seems never complete in this cdsl dictionary project.

A previous extended discussion occurred in 2021, here.

The latest resurrection appears here.

My attempts to put out the latest fires appear below.

funderburkjim commented 1 week ago

Devanagari

In the 2021 work specifically here, a transcoding file slp1_deva1.xml was developed for devanagari displays to mimic the Boeghtlink system of accent representation.

The getwordClass.php applies this transcoding froms slp1 to Devanagari (at about line 45 ff.):

   if (in_array($dict,array('PWG','PW','PWKVN')) && ($filter == 'deva') && ($getParms->accent == 'yes')) {
    // Causes display of udatta accent to be superscript Devanagari 'u'
    // As occurs in the print of these dictionaries. So slp1_deva1.xml is
    // used as the transcoder file.
    $filter = 'deva1';
   }

The simplest way to have the MW displays to mimic pw dictionaries is to add 'MW' to this list of dictionaries.

Thus, I am now so modifying this program.

A similar change made in csl-apidev repository.

New example of svarita accent display in devanagari in mw:

New example of udAtta accent display in devanagari in mw:

funderburkjim commented 1 week ago

@Andhrabharati Happy so far? @gasyoun what about you? @drdhaval2785 Any comment from you, or are you still in hibernation?

Andhrabharati commented 1 week ago

The simplest way to have the MW displays to mimic pw dictionaries is to add 'MW' to this list of dictionaries.

Your adaptation of the simplest solution made MW also to render the udAtta accent with superscript 'u' as the PW family; probably this is a bit too much!!

I would suggest leaving udAtta 'unrendered' in devanagari in MW, which is the "standard" practice.

But, I see something different at my end now--

Very strange, this is!!

funderburkjim commented 1 week ago

Very Strange ..

I see something different:

Andhrabharati commented 1 week ago

You looked with a default setting of "Ignore Accents".

Just try to see what the "Show Accents" option displays!!

Andhrabharati commented 1 week ago

Seen that the "simple search" which is what you and Marcis use, is showing what you posted above; but the List and Advanced search displays behave in the strange manner that I mentioned now!!

drdhaval2785 commented 1 week ago

I am in hibernation still, but it is coming to an end. Intend to forage again.

funderburkjim commented 1 week ago

bugs fixed

in advanced search and list display.

accent display customized for mw

udAtta not displayed svarita displayed with u0951

@Andhrabharati Unless I've made an error, this seems to match your suggestion regarding 'standard practice'. Note that the 'unrendering' of udAtta hinders Devanagari accent comparisons between mw and pw.

When we get to agreement on Devanagari accents, I'll take up the above suggestions for IAST suggestions above.

Andhrabharati commented 1 week ago

Yes @funderburkjim, this piece of work now appears good enough (to move forward with other points).

BTW, my eyes never fell on this earlier, but noticed these just now--

What do these "3 buttons" indicate (stand for)? I see no action, when they were pressed or hovered upon.

Andhrabharati commented 1 week ago

representation of accents in Sanskrit seems never complete in this cdsl dictionary project.

Now, I would venture saying that this has touched its "proper" goal-post (so far as the major dictionaries that used accents are concerned; there could be some minor works that may have to be looked at, but that is not a TOO SERIOUS issue to worry about).

funderburkjim commented 1 week ago

What do these "3 buttons" indicate (stand for)?

They are relevant when highlighting is on in Advanced Search display. They navigate the selected text instances ('>' next, '<' previous). I think the '=' gets the nearest).

funderburkjim commented 1 week ago

IAST ḻ

Changed slp1_roman.xml and roman_slp1.xml for displays. Now using ḻ (LATIN SMALL LETTER L WITH LINE BELOW ) in place of former Latin small letter L with stroke.

Change for either roman input or output.

funderburkjim commented 1 week ago

concern about combining iast accents

I don't foresee it being too difficult to change 'slp1_roman.xml' to display IAST output accents for Sanskrit text using the unicode combining accents.

However, if that is done, then will it be necessary to review ALL the dictionaries (mw.txt, pw.txt, etc.) and replace all sanskrit text with preformed accents into text with combining accents? This seems like a big task, and a complicated one. (accents occur in many non-Sanskrit text fragments (think of the 'etymology' sections of MW, or the French dictionaries) - for many of these non-sanskrit texts the conversion to combining accents is probably inappropriate.

Perhaps such modifications with the xxx.txt files are NOT required. If so, let's go ahead with the changes to the transcoding files.

Andhrabharati commented 6 days ago

My initial post which is the start-off point for this issue had what I wanted to say (quite clearly)

Now, about Roman accent

It appears that two versions of transcoders are "floating around" in the CDSL "domain", namely one with IAST accents (having combined-circumflex mark for svarita), and another with ISO15919 accents (having combined-grave mark for svarita). This would definitely be a matter of great confusion among the users.

Jim MUST attempt to take a corrective action for this, and adopt what he has https://github.com/sanskrit-lexicon/MWS/issues/140#issuecomment-1250381878.

but the subsequent posts might've raised a confusion that I was talking about 'preformed accented letters' vs. 'letters with combined accents'.

Let me try to be more clear and resolve the matter now.

The slp1_roman.xml (transcoder) files in the COLOGNE and csl-websanlexicon repos, that are the gateways to the end user, have the Roman accents "proper", as

* udAtta / 0301 COMBINING ACUTE ACCENT * svarita ^ 0300 COMBINING GRAVE ACCENT * anudAtta \ 0331 Combining Macron Below --> <e n='116'> <s>SKT</s> <in>/</in> <out>\u0301</out> </e> <e n='117'> <s>SKT</s> <in>^</in> <out>\u0300</out> </e> <e n='118'> <s>SKT</s> <in>\</in> <out>\u0331</out> </e>

These are what Jim had named as 'ISO accent extension' of IAST sometime back

Then, the svarita-grave, anudAtta macron below might then be described as IAST with 15919 accent extension.

But the ones that he has offered me to use for my local working (on mw and PW set), namely the one (of current topic) at "mw transcoding versions" (mws issue 90) and "Fresh Look, starting with <is> tag" (pwk issue 95) have the 'original iast accent rules'

* udAtta / as \u0301 * svarita ^ as \u0302 * anudAtta \ as \u0300

(that I had posted above); this makes my locally generated file to differ with what I see at the cdsl web-display that I had mentioned in my earlier post

And, this also showed [at 731560] an error in the mw transcoder file given by Jim earlier (for my usage), rendering "◌̂" (u+0302, combining circumflex) instead of "◌̀" (u+0300, combining grave) for the grave accent.

Probably, I am THE sole person that "look" at the local files, in which case there is no need to worry about the point.

I shall change the transcoder files at my end & put an end to this issue.

funderburkjim commented 6 days ago

I shall change the transcoder files at my end & put an end to this issue.

That being the case, I'll consider this issue closed.

Where will the accent phoenix arise next ? :)

Andhrabharati commented 6 days ago

Where will the accent phoenix arise next ? :)

If you are eager, probably I (or someone else) would've to look at the other minor works (of CDSL) also once, to put this "phoenix" to sleep forever.

sanskrit-lexicon / MWS