w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
8 stars 15 forks source link

Versals in Tamil, Telugu, Malayalam, etc. #56

Open r12a opened 5 years ago

r12a commented 5 years ago

CSS has a question about alignment of initial-letter for South Asian scripts without hanging baseline at https://github.com/w3c/csswg-drafts/issues/864

I pointed them to the Indic Layout Requirements at https://w3c.github.io/ilreq/#h_scripts_without_hanging_baseline

Is there any additional information that should be mentioned in this context, in addition to what is in the requirements doc?

santhoshtr commented 4 years ago

I think the general rule that recommend 3/4 of total height is not accurate.

While there is no strict rules followed by publishers, some samples from Malayalam may help.

image

image

image

image

Here Hanging baseline is at x-height and the total drop cap height is x-height+descend.

In addition to this, I have seen a padding around the whole drop cap as below. There, there is no alignment at all. Is this in the scope of spec?(Sorry, I have not read the background information completely)

image

tiroj commented 4 years ago

Correct size and alignment of drop initials for Latin and related scripts are easily calculated because fonts contain specific data points for cap height. I don't see a way to reliably automate drop initials — especially not clusters that might involve above or below subscripts, possibly stacked two deep in some cases — for non-European scripts that lack any data points except the glyph outlines and their positioning offsets. To cleanly handle arbitrary clusters in this situation, you'd need either additional data points and/or something measures 'ink' (like some aspects of math handlers that do layout for complex equations.

r12a commented 4 years ago

Does anyone have any examples of large initial letters in Tamil text?

murasu commented 4 years ago

image

image

I'll look for more.

murasu commented 4 years ago

Some more examples ... image

image

Both are from the book 'தில்லை என்னும் திருத்தலம்' by Chandrika Subramanian. Kannadhasan Publications, Chennai, India. 2015.

r12a commented 4 years ago

In addition to this, I have seen a padding around the whole drop cap as below. There, there is no alignment at all. Is this in the scope of spec?(Sorry, I have not read the background information completely)

Yes, it is. There's a border-box value for initial-letters-align.

miloush commented 4 years ago

These from India Today are rather glyph fitted:

image

image

As for padded ones, some well-known Tamil early prints:

image

image

image

Note that the last one has only the first glyph of a syllable as initial.

murasu commented 4 years ago

The last three early Tamil prints use the old orthography. The vowel-sign AA and RA, RI, RII share the same glyph. Vowel signs E and EE share the same glyph. No pulli on dead-consonants.

miloush commented 4 years ago

Indeed, does that change anything for the drop initials?

murasu commented 4 years ago

Drop initial with vowel-signs alone may be a practice in the past. The last image is from the book 'Tambiran Vanakkam', which is the first book printed in an Indian script. I am not sure if they mistakenly saw the vowel signs as stand-alone letters as I believe this book was not printed by native speakers. The typeface used in this book was made based on inscriptions in palm-leaf manuscripts, which certainly did not use drop-caps. It will be interesting to see if later prints continued the practice of drop initials with vowel-signs alone.

miloush commented 4 years ago

I would argue that the reason we mostly see the whole syllabic cluster as drop initial nowadays is due to software constraints. Similarly in epigraphic inscriptions, you can find line breaks in the middle of a syllable, which you cannot reproduce using Unicode plaintext now.

I think it goes down to what the purpose of the standard or guidance is. If a particular orthography is practice in the past, is that something we want to allow or prevent?

(Either way, whether vowel marks are valid drop caps on their own is not really a point of this issue, these were the easiest drop caps with padding I could find.)

vivekpani commented 4 years ago

I think the syllable cluster is not a limitation in software, but a necessity for the Indic languages (including Tamil). In fact, not all OS implement the syllable clusters and a lot is left to the OpenType's glyph clustering which is a "mistake".

Indian languages are phonetic (in speaking as well as in writing) and hence, a cluster is not broken. The letters end up losing the meaning with broken clusters. I think a lot of this has now been "accepted" by hapless users because most software do not implement these well. Whereas, there would be "no" examples in writing where there is a line break/hyphenation or any kind of mid-cluster styling.

On Sat, Dec 28, 2019 at 8:38 PM Jan Kučera notifications@github.com wrote:

I would argue that the reason we mostly see the whole syllabic cluster as drop initial nowadays is due to software constraints. Similarly in epigraphic inscriptions, you can find line breaks in the middle of a syllable, which you cannot reproduce using Unicode plaintext now.

I think it goes down to what the purpose of the standard or guidance is. If a particular orthography is practice in the past, is that something we want to allow or prevent?

(Either way, whether vowel marks are valid drop caps on their own is not really a point of this issue.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/iip/issues/56?email_source=notifications&email_token=ABEELW777LO5F7ERF2PQOODQ25TVHA5CNFSM4IPA44XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYLVXI#issuecomment-569424605, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEELW42Q6DWGTXOT3EPHHLQ25TVHANCNFSM4IPA44XA .

-- ବିବେକାନନ୍ଦ ପାଣୀ । विवेकानन्द पाणीVivekananda Pani

Research

+91-9449812397 [image: Address] https://www.google.com/maps/place/Reverie+Language+Technologies/@12.9213718,77.6642378,15z/data=!4m5!3m4!1s0x0:0x9ceafa1d8fa821f8!8m2!3d12.9213718!4d77.6642378 https://www.reverieinc.com/ [image: facebook] https://www.facebook.com/reverietech [image: twitter] https://twitter.com/reverietech [image: linkedin] https://www.linkedin.com/company/reverie-language-technologies-pvt--ltd/

-- The information contained in this e-mail message and/or attachments are confidential or privileged information of Reverie Language Technologies Pvt. Ltd. Unauthorized dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments.

miloush commented 4 years ago

I gathered some more modern examples of when only part of the syllable is a drop cap: image image image image ...you get the idea with பாரதி

Other letters: image image

And vowel marks only - note the two different sizes of E marks: image image

lianghai commented 4 years ago

The so called concept of “cluster” only exists when there’s a need in an analysis to reconcile Indic text’s order of graphic segments and order of phonetic segments. Yeah, Unicode is certainly based on this reconcilement for certain reasons, and the OpenType Layout deals with Unicode.

However a typographical treatment like drop initials has nothing about meaning—it’s not a linguistic analysis or statement, but just a superficial styling on the most graphic level.

Note you don’t write in the phonetic order, but generally in a much more graphic order. Unicode–OTL’s way of doing cluster-by-cluster encoding and shaping is not a reason for enforcing scholarly understanding of scripts and texts onto average users.