sanskrit-lexicon / CORRECTIONS

Correction history for Cologne Sanskrit Lexicon
8 stars 5 forks source link

MW display of <shortlong/> markup #353

Open funderburkjim opened 7 years ago

funderburkjim commented 7 years ago

This issue was raised recently by Gemonat, a frequent contributor of corrections to MW.

Case 23694: 05/04/2017 dict=MW, L= 110540, hw=nizam, user=geymonat
old = caus. -zAmayati LB ind.p. -zAmya 
new = caus. -zAmayati/zamayati [or: -zA/amayati but I think is less clear] LB ind.p. -zAmya/-zamya [
or: -zA/amya but I think is less clear]
comment = The problem is that in the printed edition, when the causative and the absolutive of the 
causative are quoted, we find the double sign of short and long over the a, something that cannot be 
reproduced in HK unless repeating the entries as indicated in the correction.  Because both forms are 
regular I suppose it is important to make the suggested correction, otherwise when the forms with short 
a are found one cannot connect them to the causative meaning "to observe, perceive, hear, learn"

Now there actually is markup (thanks to Peter's foresight) in MW that represents these short-long vowels:

<s>-SA<shortlong/>mayati</s>    [SLP1]

But the display of MW (disp.php) currently ignores this markup; So in this case the word is rendered as <s>-SAmayati</s>

As Geymonat points out, it is useful information that the short-vowel form <s>-Samayati</s> is also acceptable in this causal inflection.

Thus, it would be a material enhancement for us to come up with a revision to disp.php for MW that renders both short and long forms.

Maybe the easiest would be to, in effect do

<s>-Sa(A)mayati</s> 

This replacement of the preceding vowel by S(L) (short vowel(longvowel)) would work in all cases, and would make the short-long forms visible in the displays. Putting the long vowel form in parentheses seems a clearer representation than separating the two vowel forms with a forward slash 'S/L'.

By contrast, I suspect that generating two forms of the whole word-fragment might sometimes give odd results, because of all the varieties of Sanskrit word fragments in MW.

gasyoun commented 7 years ago

display of MW (disp.php) currently ignores this markup

If we add our web font, then we can show it as in the book, no need to code much. My Charter contains the needed signs and I can add more if needed.

aiu

This replacement of the preceding vowel by S(L) (short vowel(longvowel)) would work in all cases, and would make the short-long forms visible in the displays. Putting the long vowel form in parentheses seems a clearer representation than separating the two vowel forms with a forward slash 'S/L'.

S(L) is better then 'S/L', agree.

I suspect that generating two forms of the whole word-fragment might sometimes give odd results

Indeed, it needs manual verification if done so. What if Gemonat is ready to verify the results?

drdhaval2785 commented 7 years ago

One possible solution regarding shortlong so that it is equally legible in Devanagari.

([consonants])a -> $1A ([consonants])i -> $1I ([consonants]*)u -> $1U

or something like this.

In SLP1 consonants are kKgGNcCjJYwWqQRtTdDnpPbBmyrlvSzsh|.

drdhaval2785 commented 7 years ago

In a nutshell,

आवलि(ली) is better than आवलि(ई).

funderburkjim commented 7 years ago

आवलि(ली) is better than आवलि(ई)

We have to take into account the multiple output forms for things coded as <s>X</s> in xxx.xml. We want to display appropriately for output = Devanagari, itrans, hk,slp1, roman(=IAST).

If we didn't treat IAST output separately, then the IAST output form would be

<s>Sa<shortlong/>mayati</s> -> śa(ā)mayati which is quite readable, though admittedly less elegant than the form @gasyoun shows above.

I think only Devanagari output would require taking the preceding consonant (or consonant cluster) into account.

funderburkjim commented 7 years ago

Question on Charter example.

The way I read the lovely a+macron+breve example shown in the Charter Font comment above is that there is a specific preformed character in this Charter font. Is this so?

And if so, how does one get that preformed character into a text string?

The way to generate a+macron+breve in Unicode that I know of is to use two unicode characters:

gasyoun commented 7 years ago

a specific preformed character in this Charter font. Is this so?

Yes, in private Unicode zone. No legal Unicode way to do what MW did.

आवलि(ली) is better than आवलि(ई).

Sure, but MW's base is IAST, so I would want it to remain so. But I understand the idea.