w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.
Other
62 stars 31 forks source link

Drafting “justification” #57

Open mostafah opened 8 years ago

mostafah commented 8 years ago

We have a wiki page to hold the draft for this section. The discussion can happen here. At the moment the draft is only an outline. I’ll be adding text under each title.

r12a commented 8 years ago

Do we have any information about the viability or special rules affecting use of inter-word spacing in nastaliq Persian text? (I'm asking because i think there may be such for Urdu nastaliq, given that i have been told that they don't use spaces between words as often as in naskh-based text).

mostafah commented 8 years ago

I don’t have much reliable knowledge on the matter myself. We can look into the matter for more information, but I’m thinking about how much we want to talk about calligraphy and how much we want to focus on digital typography. That’s something we can discuss in teleconference.

ntounsi commented 8 years ago

Hi some comments

1) Justification section

2) Alternative Shapes section

mostafah commented 7 years ago

@ntounsi Thanks for the feedback. I fixed the typos and will consider the image suggestions when I’m done with the text and start on the images.

behnam commented 7 years ago

Zarnegar 5.2 Catalog (http://sinasoft.com/Downloads/zarnegar5.2/catalog/Zar52Cat.pdf) has some good examples of variant glyphs, which can be used to improve justification.

From Page 8: image

From Page 9: image

From Page 10, word-based style selection feature: image

Also, on page 13, it covers the application feature to specify justification rules per letter: image

And, on pages 14 and 15, it describes more details about the justification algorithm and its parameters: image

image

And finally, page 17 and 18, have examples of justification methods for Persian poetry: image

image

behnam commented 7 years ago

Regarding word elongation/stretching, this MS thesis covers some rules of Nastaliq and a mathematical model for applying it on a font outline.

Mohsen, Shahab. "The Problem of Stretching in Persian Calligraphy and a New Type 3 PostScript Nastaliq Font." (2010). https://uwspace.uwaterloo.ca/bitstream/handle/10012/4974/Thesis.pdf

From page 36: image

khaledhosny commented 7 years ago

There is also https://www.tug.org/TUGboat/tb27-2/tb87benatia.pdf and http://quod.lib.umich.edu/j/jep/3336451.0013.105/--justify-just-or-just-justify?rgn=main;view=fulltext (and at least a couple other papers by the same authors that I can’t immediately find) which focuses mostly on Naskh.

There is also the algorithm used by (at least) IE and LibreOffice https://www.microsoft.com/middleeast/msdn/JustifyingText-CSS.aspx

mostafah commented 7 years ago

Thanks for the resources.

@behnam Are those rules exclusively for Nastaliq or can they be considered for Naskh as well?

ntounsi commented 7 years ago

FYI I just posted an issue on how to NOT justify a piece of text inside a justified paragraph. https://github.com/w3c/csswg-drafts/issues/853

shahab32mohsen commented 7 years ago

@Mostafah, these rules are only for Nastaliq. if you have questions about Naskh, probably my previous supervisor in University of Waterloo can help. His email is dberry@uwaterlio.ca( professor Dan Berry)

zoghal commented 7 years ago

hi, this document talking about Elongation letters in Nastaliq (here)

ntounsi commented 7 years ago

I posted here some pictures about justification.

Some points are:

Note : not all points are illustrated.

behnam commented 7 years ago

Three main topics were discussed this morning about justification and I try to summarize the discussion here:

1) Having justification/elongation/stretching method of a run (span) of text inside a paragraph different from the method applied to the whole paragraph. Two examples from Najib's images show Qur'an text having different elongation/stretching (none, in one case) from the main paragraph. We need more evidence of this, as in both cases the font/style of the quoted text is different from the font used in the paragraph and it's not clear which effects are intentional and which ones are just whatever the font/type supported.

2) Elongation/stretching of the last line of a justified paragraph. Najib posted a counter-example (demonstrating existing behavior of a desktop publishing software) of the problem. We need more good examples of how it's done professionally. (See my next comment.)

3) Justification of poetry, specially the case where a word can be split in the middle, either in a joining point or a non-joining point. One good question about this case is how to markup the text a way that it's actually accessible/searchable semantically and typeset correctly visually.

behnam commented 7 years ago

Regarding item (2) from https://github.com/w3c/alreq/issues/57#issuecomment-273401163, here's a good example from a book published recently in London by H&S Media, a publisher specialized in Persian books.

https://books.google.com/books?id=ujLeDAAAQBAJ (p. 14)

image

As you can see on this page (and similarly in all 200 pages of the book), single-line paragraphs are not justified at all and there's no elongation, while multi-line paragraphs are justified with the last-line having the same (average) elongation/stretching of the rest of the paragraph.

ntounsi commented 7 years ago

@behnam, We also talked about having some properties for Arabic in CSS, in order to control justification methods.

khaledhosny commented 7 years ago

I have few books from Bulaq press from its various eras, they should give some example of high quality typesetting in metal type era. Will try to post some scans shortly.

behnam commented 7 years ago

That's right, @ntounsi, we usually need to talk about what's already supported in CSS. I didn't put it in my list because we know Arabic justification support is pretty broken in CSS and that's not going to be a source for us. But, if there's any feature in CSS that we haven't mentioned, we should definitely add it to the list.

behnam commented 7 years ago

Another old desktop publishing app that had an early implementation of Arabic Justification is "al-Nashir al-Sahafi", by Diwan Software Ltd.: http://diwan.com/index.php/products/desktop-publishing/40-al-nashir-al-sahafi-for-mac and http://diwan.com/index.php/products/desktop-publishing/38-al-nashir-al-sahafi-yaqout-for-windows

The manual for Mac has some info on page 57: http://diwan.com/download/Sahafi_Mac_Manual.pdf

image

And the manual for Windows, on page 54: http://diwan.com/download/Yaqout_Manual.pdf

image

@ironymark may be able to share with us more about the features supported for justification.

khaledhosny commented 7 years ago

Continuing the discussion on the mailing list about the justification of Quran when embedded in regular text, here are some examples of such usage.

Here the Quran uses a completely different font, which seems to be just copied from the Madina mushaf. The text is justified, but uses a different justification strategy than the surrounding text, but I think that is mostly a side effect of how the Quran text is typeset, not a deliberate choice.

img_20170121_233832-small

Here the Quran text is typeset using the same method as surrounding text, and the same justification method as well: img_20170121_233144-small img_20170121_232835-small

ntounsi commented 7 years ago

"I post here, but I wonder if it should not also be in the repository of samples of typographic, just created by Richard."

In the absence of a good implementation of Kashida, the justification method of Arabic texts should be spaces. inter-words or intra-words if any. Use of Tatweel only for some simple Arabic fonts.

Rational : (a) Some (legacy?) Arabic fonts are quite horizontal at their baseline and the letters always join horizontally. Called also simplified font (with limited number of shapes, e.g. same glyph for initial and medial form, and same shape for end and isolated form) they allow writing of kind typewriter. In general, defaults fonts or sans-serif are of this kind. simplefont

(b) Other recent fonts, more in the spirit of cursive Arabic writing, with slightly more curved shapes, allow for a more aesthetic writing : with ligatures, groups of letters, contextual letter shapes with different keystrokes etc. Amiri : amiri Arabic Typesetting : arabictypesetting

The justification with Tatweel (U+0640, ـ , small flat line) goes better to the first type of font (a). tatweel Besides, would not Tatweel be a simplified implantation of Kashida?

Arabic Kashida (not a character), is a curve line to elongate between letters or at the end of some letters. It is extensible enough to adapt to individual letter and to the context.

I think Kashida is the method well suited to the second kind of writing since it make it possible to maintain the aesthetic of curved lines. kashida

So, in the absence of a good implementation of Kashida, the justification method of Arabic texts should be spaces inter-words or intra-words if any. Instead of Tatweel.

Any thoughts?

r12a commented 7 years ago

I post here, but I wonder if it should not also be in the repository of samples of typographic, just created by Richard

You're posting in exactly the right place. That repo is only for pictures, not connected arguments. ;-)

The justification with Tatweel (U+0640, ـ , small flat line) goes better to the first type of font (a).

Bear in mind that in a web browser the width of the text is usually fluid. This is particularly problematic for justification of any text where the content author has added tatweels, since they will only work to justify the text if the window they are created in and the window they are viewed in are the same width exactly.

It's possible that a justification tool could add tatweel characters while justifying, in the way a justification tool for English would add spacing. This would work, though i'm told that such results don't please the eye.

khaledhosny commented 7 years ago

IE supports automatic kashida justification for quite some time already. The algorithm (see https://github.com/w3c/alreq/issues/57#issuecomment-270252896) is OKish, but seems to be mostly suited for simple Naskh styles.

mostafah commented 7 years ago

A question came up during our meeting this week: Some softwares use tatweel characters for justification, but they shrink or extend the tatweel character horizontally when they add it for justification. The question is whether we should categorize these implementations as tatweel or as kashida. There is no need to answer the question right now. We just have to think about it when updating the draft on the justification. We might need to see more examples and make a decision later.

khaledhosny commented 7 years ago

I don’t think making a distinction between the tatweel encoded character and the kashida as a justification device is that helpful. The encoded character can probably be traced back to early computer systems where every glyph was an encoded character, which in turn can be traced to the metal typesetting practices of using using kashida sorts with different widths for justification (among other methods like alternate glyphs and varying word spacing). So IMHO they are the same thing, just done differently for different media (calligraphy vs metal type vs early computing systems vs modern ones) and I don’t think we should put much emphasis on the differences, they are just implementation details.

behnam commented 7 years ago

I had ACTION-54 with a similar goal. I think we can close that and continue the work here. (https://www.w3.org/International/groups/arabic-layout/track/actions/54)

behnam commented 7 years ago

Another example for maintaining same elongation on all lines (including last line) of a justified paragraph, from the Extension to Ninth-Grad Science book:

http://www.chap.sch.ir/sites/default/files/lbooks/95-96/556/C134-1.pdf

screen shot 2017-04-12 at 3 22 34 pm
ebraminio commented 7 years ago

I guess this would be useful for you guys also http://github.com/ebraminio/kashida

It is a simple just working harfbuzz directwrite backed kashida implementation example that is simple to compile, all it needs is clicking the init.bat on a Windows machine with Visual Studio 2017 Community, ragel on Windows PATH and git be installed (no cmake instllation as VS2017 already has a builtin one) and its only dependency is harfbuzz which the batch file fetches it itself.

The part that I guess would be useful is https://raw.githubusercontent.com/ebraminio/kashida/master/output.txt which is result of this script https://github.com/ebraminio/kashida/blob/master/process.js which is considered to be exapnded places on directwrite kashida implementation on the updated BijanKhan text corpus Behnam just introduced on P-C ML without duplicated words, and on these, 752 is presense of kashida glyph and whole sequence is reversed on the implementation originally to fit usual harfbuzz output. (so on the first line, kashida happened between ی and ن)

I don't think it add much on the table but perhaps useful for testing. Thanks