Rare Accent Coding Issues (ढ्य१ः॒᳠)

gasyoun commented 9 years ago

In http://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/web/webtc/indexcaller.php for aMhati pwg but as per scan http://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/web/webtc/servepdf.php?page=1-0005 the accent should be above १ - wonder if possible at all, any clue?

Andhrabharati commented 3 years ago

@funderburkjim,

Looking closely at the examples you gave, it appears what you take as the windows default font is actually Google's Noto Serif Devanagari.

The glyphs are not matching with Nirmala or any other windows fonts!!

Sorry, to be back on a closed item.

I could not sleep with the point lingering in mind.

The screenshot posted by @drdhaval2785 also has the Noto font glyphs only, it being the default on Android phones.

So, it was the plain "default" tagging used by Jim that misled the things.

None of us is intentionally wrong, but my eyes have noticed the difference between Nirmala and Noto very clearly and I promptly reported the same.

funderburkjim commented 3 years ago

Microsoft reference page for sanskr font.

There might be license issues if we chose to use this font on web site. But probably that is irrelevant since we are most interested in siddhanta and adhishila fonts.

Andhrabharati commented 3 years ago

As you're not distributing the font, but just use it on a Windows machine to display something it is alright.

And if someone copies the text, he wont get the font with it, and only his local font will be replacing its place.

So I don't foresee any issue using the font on the Cologne server, if running on Windows.

Anyway, it just happened that I came to know about its existence and I liked it and shared the info with the team here. Nothing more.

gasyoun commented 3 years ago

Anyway, it just happened that I came to know about its existence and I liked it and shared the info with the team here. Nothing more.

Yeah, let's forget about it for now.

Andhrabharati commented 3 years ago

I did (by saying 'Nothing more'), @gasyoun !

And Jim is waiting for your opinion, as asked here- https://github.com/sanskrit-lexicon/PWG/issues/5#issuecomment-897779312

Only then he would take a further step, I guess.

funderburkjim commented 3 years ago

example3b

example3b.html provides ready reference for

the specific ligatures mentioned by Dhaval
the problematic 'ya' ligatures mentioned by Andhrabharati
a representative collection of accented words.

Three fonts (siddhanta, adhishila, and sanskr) are compared.

Note also that in the accent section, the accents display properly for siddhanta (and sanskr) when occurring before anusvara, visarga, and candra-bindu; this requires a transcoder trick, which reorders the code points; to emphasize this I've included the code-point sequence when the transcoder does the reordering.

Also note in the last accent example the way adhishila displays svarita-anusvara --- very hard to detect the anusvara.

I'm still not decided on whether to use the adhishila font for pwg/pw accents. Currently, the 'simple search' display is still using siddhanta with reordered code-point sequences.

On the one hand, the transcoding trick provides reasonable accent displays in siddhanta, and siddhanta displays svarita-anusvara better than adhishila.

On the other hand, the transcoding trick is conceptually ugly, and it is nice that adhishila does not require it.

Another consideration ... if you copy/paste accented Devanagari text from the Cologne display and view it in a font different than adhishila; then, based on current fonts, the reordered unicode required by siddhanta is better. For instance here I've copy/pasted siddhanta and adhishila from the example3b display: अं꣫हति | अ꣫ंहति; In my browser, they are being rendered in Nirmala and the 2nd version has that ugly empty circle.

Andhrabharati commented 3 years ago

I would've suggested applying the 'transcoding trick' to Adishila as well, and then go for an unbiased comparison; its this way that I had made my files above. (And one can see that aMhati text in Adishila with reordered transcoding is equally good in comparison with Siddhanta and sanskr., whether at web or in copy/pasted text)

Anyway, I just would like to mention that I am not biased towards Adishila, but just like to see the letters rendered without a 'confusion' to the (end) user.

And like to say that the font changing proposal was initially raised by me in view of some -ya conjuncts in Siddhanta, which I am sure cannot be solved by any such 'tricks', to the best of my knowledge. And this is a global issue for Siddhanta across all Cologne dictionaries, not just limited to PWG/PWK accents.

Finally, as I mentioned earlier, the sanskr font appears to be the all-round winner now, atleast until a corrective action happens inside Siddhanta font.

BTW, I have seen that the issue was already posted in its creator's blog by someone few months back, and there is no apparent reaction on it yet. I doubt if he is still interested to work on the fonts (guess he has done the job about 10 years back).

Andhrabharati commented 3 years ago

And I see that there is no technical limitation to do the corrections in Siddhanta, as @gasyoun had mentioned above. (I've tried doing it myself, for my local use.)

Andhrabharati commented 3 years ago

Very good news!!

Found a simple and "best" solution that would satisfy everyone; just wonder why I didn't think in this direction before.

Andhrabharati commented 3 years ago

like to know what it is?

drdhaval2785 commented 3 years ago

Sure. Please share

Andhrabharati commented 3 years ago

Just have a look at this-

-ya conjuncts and accents.htm.pdf

The solution is just to use Siddhanta1 instead of Siddhanta!! [And, of course, to resort to the ugly 'transcoding trick' of Jim!!!]

One MUST bow to Mihail Bayaryn, for his farsightedness in envisaging the eventualities, having created different variations of his font (having so many alternate glyphs inside) once for all. It is upto the end user to be careful to select a "proper" font variant as per his needs.

And I should give due credits to @funderburkjim for his willingness to resolve the issues (in his unhurried manner) [and to myself for the perseverance to see that the task is not left un-done].

drdhaval2785 commented 3 years ago

With this Siddhanta1 font, the major downside of Siddhanta font i.e. ya conjuncts is taken care of. So we should go with Siddhanta1.

Andhrabharati commented 3 years ago

And Jim should make it THE font in all the pages (wherever Devanagari occurs) across the Cologne server, not to name them individually.

Andhrabharati commented 3 years ago

Just a final remark (now on Siddhanta1) and I move onto to another task. [Hoping Jim does not need my reminder(s)/follow-up on this font changing task.]

See qdvya in the penultimate line (in my pdf).

It would've been more appropriate to use the bottom-level -ya variant (like at the ktya and krya glyphs in the image I had posted earlier above), than the top-level variant as here. But it's alright, not odd enough to worry much about it.

funderburkjim commented 3 years ago

@Andhrabharati -- kudos on the siddhanta1 find! Your pdf is good supporting document for using siddhanta1.

For reference, here is where siddhanta1.ttf was found:

main web page: https://sites.google.com/site/bayaryn/
link Additional files for download contains several font files in siddhanta-variations folder:
- siddhanta1.ttf
- siddhanta2.ttf
- siddhanta-cakravat.ttf
- siddhanta-cakravat1.ttf
- siddhanta-cakravat2.ttf
- siddhanta-calcutta.ttf
- siddhanta-calcutta1.ttf
- siddhanta-calcutta2.ttf
- siddhanta-nepali.ttf
- siddhanta-vyakarana.ttf

Will use siddhanta1.ttf for all dictionary Devanagari displays (functionally replacing siddhanta.ttf). The transcoding 'trick' is currently only used with PWG, PW; also, Devanagari accent display for other dictionaries is not changed (i.e., the Bohtlingk interpretation of udAtta and svarita is used only in PWG, PW).

Will mention siddhanta1 installation progress as it occurs.

funderburkjim commented 3 years ago

pdf vs. browser

Just a random observation/question -- irrelevant to the siddhanta1 installation.

In comparing the html (example3b.html ) to the pdf (-ya conjuncts and accents.htm.pdf), two things are noticed:

The Devanagari script is uniformly clearer and easier to read in the pdf at normal magnification, although the two are quite similar at high magnification
The display of last item (fzi^M) in Adhishila is clear in pdf, but unclear in html, at any magnification

So the pdf seems to render fonts better than the browser (even though the Edge browser is rendering the pdf ). This is mysterious.

funderburkjim commented 3 years ago

installation re simple-search

These files modified in csl-apidev:

fonts/siddhanta1.ttf the network font file
css/basic.css specifies that font-family siddhanta-deva now uses font file siddhanta1.ttf
- This is used in the list display of results (csl-apidev/listview.php)
simple-search/v1.1/list-0.2s_rw.php css to use siddhanta1.ttf for the 'results' list of 'simple' search
- v1.1 is the current 'default' simple search
simple-search/v1.1a/list-0.2s_rw.php similar for an alternate, dev version
utilities/transcoder/slp1_deva1.xml has the Devanagari transcoding rules (with tricks) used for pwg, pw dictionaries.
- other dictionaries use slp1_deva.xml

funderburkjim commented 3 years ago

Note: 'removing cached files' and Ctl-F5 both may be needed to get siddhanta1 in the displays. You can try such word as 'Iqya', 'kuqya' to see the new siddhanta1 'ya' ligature.

Will similarly modify basic, advanced search, etc tomorrow.

funderburkjim commented 3 years ago

Modify displays (basic, advanced search, list, mobile1) to use siddhanta1 font for Devanagari. Also pwg and pw displays for accents uses slp1_deva1.xml (tricks) for transcoding.

Files involved are in csl-websanlexicon directory:

inventory.txt
makotemplates/web/fonts/siddhanta1.ttf
makotemplates/web/utilities/transcoder/slp1_deva1.xml
makotemplates/web/webtc/font.css
makotemplates/web/webtc/getwordClass.php

Ran redo_cologne_all.sh script in csl-websanlexicon/v02/ so changes apply to displays for all dictionaries.

funderburkjim commented 3 years ago

I think we can call this issue completed. Whew!

One thing that caught my eye is that for Devanagari with accents in PWG, PK we need wider line-spacing. This involves some kind of css change. Maybe a comprehensive css review can be done some time.

Andhrabharati commented 3 years ago

Happy ending indeed, this is!!

Andhrabharati commented 3 years ago

And yes, you might consider making a separate CSS for these accented displays of PWG and PW in this spree itself, instead of keeping it for doing sometime later.

Andhrabharati commented 3 years ago

Rather, it has to be applied everywhere, I guess-

see this from MW-

Too crowded text, and the accents protruding into the abbr. and ls. dots of the above lines.

Andhrabharati commented 3 years ago

BTW, I realised that the small-cap.s have appeared now in the ls stuff,

which were not there Jan this year, when I was posting several comments on MW

is this a deliberate change/correction, or just happened so?

Andhrabharati commented 3 years ago

Looks this is not so happy an ending, @funderburkjim.

funderburkjim commented 3 years ago

Let's not discuss the css in this issue. If you want to review the css in various displays, let's do that in a separate issue.

Andhrabharati commented 3 years ago

I considered it as closely related/linked to the accents & font-changing being considered here and posted my observations.

Leave it to your discretion, whether- where- when- and how to attend to this point.

gasyoun commented 3 years ago

Adhishila is clear in pdf, but unclear in html, at any magnification

Because they use different engines for rendering, right.

I considered it as closely related/linked to the accents

It is, but is too big and so deserves a new issue. There are tens of CSS issues piled already.

Andhrabharati commented 2 years ago

When there is no SLP1 representing for the BR's Vedic 'u', I wonder why it was typed with a wrong character leading to the mess as seen here.

They could have used a spl. symbol to indicate such ones. There is one such in SKD text to denote a Vedic symbol which is there in very few fonts (Siddhanta has it in pvt. area), the same symbol which is also used to denote the small cap words in the original 'raw' file of PWG. This is still remaining in the SKD text, unconverted to unicode.

@funderburkjim

As I am now having access to @thomasincambodia & teams original digitisation files, just thought I should glance through them once to see how they had typed these accent characters.

And here is the very first entry having all three accent symbols-

<H1>000{aMza}1{aMªza}^1¦ (s. †{gan2a} {#vRSAdi#}) •m. ¯{¤SIDDH.K.249¤}, {%b%}, ult. ³1) {%Theil%} ¯{¤AK.…2,…9,…90.…H.…1434.¤} -- ²a) {%Theil,…Abschnitt%} ¯{¤H.…an.…2,…542.¤} ({#ekadeze…vastunaH#}): {#SoDazoM…'zaH#} ({#candrasya#}) {#kalA#} {%der…16te…Theil…(des…Mondes)…heisst%} †{kala10} ¯{¤H.…106.¤} {#tUryaMza#} {%der…4te…Theil%} ¯{¤AK.…3,…4,…92.¤} {#aMzo…'STamo…'hniH#} {%der…8te…Abschnitt,…die…[Page01.0004]…8te…Stunde…des%} (15stündigen) {%Tages%} ¯{¤AK.…2,…7,…31.¤} {#SaSThamaMzaM…pradadyAtpaitRkAddhanAt#} {%den…6ten…Theil…gebe…er…vom…väterlichen…Vermögen%} ¯{¤M.…9,…164.¤} {#mamaivAMzaH#} {%ein…Theil…von…mir%} ¯{¤BHAG.…15,…7.¤} -- ²b) {%ein…Theil…des…Kaufpreises,…Haftgeld%}: {#A…na¸stujaMªª…ra¸yiM…bha¸rAMzaM¸…na…praªªtijAna¸te#} ¯{¤R2V.…3,…45,…4.¤} -- ²c) {%Antheil%}: {#udinnvaªªsya…ricyate¸…aMzo¸…dhanaM¸…na…ji¸gyuSaHªª#} ¯{¤R2V.…7,…32,…12.¤} {#adhAªªyi…dhI¸tirasaªªsRgra¸maMzAHªª#} ¯{¤10,…31,…3.…AV.…11,…1,…5.¤} {#taddAsaireva…dAtavyaM…svatoM…'zataH#} ¯{¤M.…8,…408.¤} {%Erbschaftsantheil%}: {#anaMzau…klIvapatitau#} ¯{¤M.…9,…201.¤} {#niraMzaka#} {%ohne…Erbtheil…bleibend%} ¯{¤JA10G4N4.…2,…140.¤} {#patnyaH…kAryAH…samAMzikAH#} ¯{¤2,…115.¤} -- ²d) {%Partei%}: {#a¸smAka¸maMza¸mudaªªvA¸…bhareªª…bhare#} ¯{¤R2V.…1,…102,…4.…112,…1.¤} -- ²e) {%Nenner…eines…Bruchs%} ¯{¤COLEBR.¤} {%Alg.%} ¯{¤13.¤} -- ³2) {%Theilung,…Erbschaftstheilung%} ¯{¤H.…an.…2,…542.¤} {#(vibhAjane):…sakRdaMzo…nipatati#} ¯{¤M.…9,…47.…=…SA10V.…2,…26.¤} -- ³3) Name eines †{A10ditja} ({%Theilnehmer,…Vertheiler%} ¯{¤ROTH…in…ZdmG.VI,…75.):¤} {#tvamaMzoªª…vi¸datheªª…deva…bhAja¸yuH#} †{(Agni)} ¯{¤R2V.…2,…1,…4.…27,…1.…5,…42,…5.…VS.…10,…5.…AV.…6,…2,…5.…11,…17,…2.¤} Name des 6ten †{A10ditja}, ¯{¤MBH.…1,…2523.¤} des 11ten, HARIV. 176. VP. des 5ten, ¯{¤MIT.…142,…3.¤} des 9ten, ¯{¤HARIV.…12456.¤} -- Es kommt auch die Schreibart {#aMsa#} vor.

It clearly shows that the original typed matter has properly distinguished between the 3 characters- ' ª ' for ' ꣫ ', ' ªª ' for ' ॑ ' & ' ¸ ' for ' ॒ '.

Kudos to the typing team, for having typed in this "cumbersome" (but diligent) manner; and to Thomas, for having trained them for such a working style.

It now turns out that the latter handling of the texts, namely "porting" into other formats/encodings that the mishap occurred.

And it took some couple of years to get someone feeling it ODD, and a "friendly" fight/argument session (unabated, can I say?) and the ever-willingness of @funderburkjim to do any possible change (iff convinced) brought the matter to a justifiable ending!!

[Sorry again, for posting in a closed thread; but this is the ONLY place to share this info.]

sanskrit-lexicon / PWG