w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
9 stars 15 forks source link

Dandas are wrapped alone to the beginning of a line #88

Open r12a opened 4 years ago

r12a commented 4 years ago

When the Devanagari phrase separator । U+0964 DEVANAGARI DANDA (called purna viram in Hindi) or ॥ U+0965 DEVANAGARI DOUBLE DANDA (deergh viram in Hindi) are used, some browsers select them with the preceding word on double-click, while in other browsers they are selected separately.

The properties of purna viram and deergh viram should be the same as the properties of FullStop or other punctuation marks, and a new line should not begin with purna viram and deergh viram.

r12a commented 4 years ago

The first comment in this issue contains text that will automatically appear in the Devanagari gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

lianghai commented 4 years ago

“FullStop”, or is there a better categorization that can categorize danda and double danda into the same category of question mark (?) and exclamation mark (!) so a preceding space doesn’t create a line break opportunity either?

miloush commented 4 years ago

@lianghai I have seen danda printed at the beginning of a line, although it might be just bad typesetting. Do you have a reference to suport forbidding line-break there?

lianghai commented 4 years ago

@miloush I believe metal type printed books show a general preference of avoiding dandas at the beginning of a line. I don’t have good materials in hands now. I was mostly talking about the situation of <…, letter, space, danda, space, letter, …> though, where a danda is surrounded by a pair of space characters when it’s intended to have balanced wide spacing on both sides of the sentence-terminating mark.

miloush commented 4 years ago

@lianghai as in here? :)

image

tiroj commented 4 years ago

There are two different (and conflicting) practices used in inputting and displaying danda characters. There are reasonable arguments to be made in favour of either practice, but the fact that both exist and are used seems an issue unlikely to go away soon.

Some users always input a space character before the danda, and like their fonts to space the danda accordingly, i.e. to be narrowly and evenly spaced on both sides.

Some users don't input a space character before the danda, and also like their fonts to space the danda accordingly, i.e. with more space on the left side.

The visual result desired by both sets of users is the same: a danda with a roughly equal amount of space on the left and right. But they are used to achieving it in different ways, largely dependent on the typesetting systems and fonts with which they are most familiar. And yes, this means that text ends up differently encoded depending on what font is used.

In our fonts, we've tended to use the second option, with the space built in on the left side of the danda because this is how our clients in India encode their text, without a space character before the danda. These clients are newspaper publishers, and this encoding practice is something they've inherited from previous typesetting technologies. One of the benefits, from their perspective, is that this practice prevents the danda from getting separated from the preceding word at line breaks.

Liang Hai has already convinced me that the line breaking danda handling is something that should be independent of whether there is a space character inserted before the danda or not. But in practice, this isn't something one can rely on yet.

xfq commented 8 months ago

We can add a link to the Devanagari Layout Requirements (we don't have a Gujarati Layout Requirements yet): https://www.w3.org/International/ilreq/devanagari/#h_line_breaking

r12a commented 8 months ago

In the Character Usage app i found 13 languages that use । and 8 that use ॥. These languages use 7 scripts that would fit under the IIP purvue. My orthography notes only indicate relationships to spaces for 2 languages:

a. Hindi. I looked at a few style guides for Hindi (eg. for authors in Microsoft, etc.) and my conclusion was: A number of Hindi style guides consulted require that the danda follow the last letter in the sentence, with no intervening space.

b. Odia. Here a space is expected before । because otherwise it can be confused with a vowel.

Lepcha has it's own version of these punctuation marks.

(Several southeast asian scripts also have their own punctuation that looks like the dandas, and other punctuation besides which may have a special relationship with spaces - eg. the Thai repetition marker.)