w3c / svgwg

SVG Working Group specifications
Other
695 stars 131 forks source link

Text with multiple x="" values should be shaped as if it was not broken #631

Open litherum opened 5 years ago

litherum commented 5 years ago

Text with multiple x="" values causes the characters to be broken up, within a word, so that parts of the word are in one place and parts are in another place. However, some scripts (like Arabic) require shaping across the entire word in order to be legible.

It's probably impossible to use things like kashidas to make text flow from one arbitrary place to another; however, we can at least require that the word be shaped correctly as if were placed contiguously.

This is conceptually similar to the hyphenation rules in CSS level 3. Those rules state:

When shaping scripts such as Arabic are allowed to break within words due to hyphenation, the characters must still be shaped as if the word were not broken (see §5.6 Shaping Across Intra-word Breaks).

and provides this descriptive example:

For example, if the Uyghur word “داميدى” were hyphenated, it would appear as uyghur-hyphenate-joined not as uyghur-hyphenate-unjoined.

The longer text in CSS is:

When shaping scripts such as Arabic wrap at unforced soft wrap opportunities within words ... the characters must still be shaped (their joining forms chosen) as if the word were still whole.

litherum commented 5 years ago

Cc @fantasai

fantasai commented 5 years ago

I think I would agree. CC i18n? :) e.g. i18n-alreq (I can't edit this repo's issues)

r12a commented 5 years ago

I'm finding it hard to imagine people doing this at all with a cursive script, unless it's word by word. But if they did, i'm thinking it may be more similar to vertical text as described at https://w3c.github.io/alreq/#h_vertical_upright (though that's apparently very rare too).

That approach would be the opposite to the hyphenation approach (which isn't actually used in Arabic text, but we believe may be used for Uighur), in that it isolates the characters.

I added labels for i18n-alreq and i18n-mlreq to bring this to the attention of the arabic and mongolian folks. Let's see what they say.

AmeliaBR commented 5 years ago

SVG already specifically requires that ligatures be preserved even if the component characters are assigned different positions. (The position for the second/subsequent character is ignored.) An author needs to explicitly disable ligatures if they don't want that to happen.

So it is only to be expected that contextual alternates are also preserved, regardless of glyph positioning. The browsers that don't do this are broken.

We cannot assume that an author that gives exact positions to letters is intending to render them in isolated form. In many cases, the positioning attributes are about ”locking in” text shaping from the original design software, not creating abstract designs.

For cursive scripts, the results of this locked-in position may be sometimes sub-optimal, with imperferct overlap of the glyphs. But for the overall design, preserving the exact position of the letters in the graphic, as the author designed it, may be more important than letting the user agent's text-shaping engine do what it thinks is best.

r12a commented 5 years ago

@behnam @shervinafshar @sahafshar do you have any thoughts on this?

behnam commented 5 years ago

Thanks for flagging, @r12a.

We have tried to answer this question on the Joining section of the Requirements doc. (https://w3c.github.io/alreq/#h_joining)

Imho, a good way to always look at joining-related questions is to remember the fact that shapes of letters in Perso-Arabic script do carry semantics with them, and therefore should not be changed without explicit, possibly language-aware, indications.

If I try to translate the question here to other writing systems, the question would become something like this for Latin: would you replace "a" glyph with "A" because the requested hight of the glyph was larger than what you got for "a"?

Yes, ideally there will exist a typesetting framework that, given the text and some x attributes, would know how to position the letters, in their meant shapes, at the desired coordinates. This almost exists for Latin script, but clearly doesn't exist for Perso-Arabic script, yet. So, changing the text to match the current (limited) technical ability of type setting doesn't really help anyone.

That said, it would still be possible, if needed, to use the x attribute to position any explicitly shaped character, following use of joining control characters, as described by Unicode.

Hope this is helpful.

css-meeting-bot commented 5 years ago

The SVG Working Group just discussed Text with multiple x="" values should be shaped as if it was not broken, and agreed to the following:

The full IRC log of that discussion <krit> topic: Text with multiple x="" values should be shaped as if it was not broken
<krit> GitHub: https://github.com/w3c/svgwg/issues/631
<krit> myles: If you have text element in SVG and it has an attribute x="..." with a list of values... in latin text every letter gets placed at a different position. So for each element the letters can spread out across the document. This makes sense in western latin script. In arabic or Hindi there are shaping rules that are conceptional.
<krit> myles: If you split up the text and position each letter... how should the shaping interact
<krit> myles: should they get shaped as separate runs or should letters be shaped in the same run...
<krit> Tavmjong: Similar to styling: if you fill a letter with red then the shaping should not be broken
<krit> Tavmjong: Tango works like that. You can shape a span by itself but you have context before and after. Tango knows how to shape the main thing.
<krit> Tavmjong: haven't tried in InkScape hard enough.
<krit> Tavmjong: I do agree that shaping should not be broken
<krit> Tavmjong: we explicitly state in SVG 1.1 that literatures are broken
<krit> Tavmjong: optional literature's are broken and that makes sense.
<krit> Tavmjong: for instance with spacing by letter-spacing
<krit> myles: someone from the i11n group mentioned that shaping has meaning.
<krit> myles: shaping has a meaning to a reader. So we should not change the shaping.
<krit> chris_: That is true but also: if you put text on path the baseline is shaping and you can fake it.
<krit> chris_: but you should probably go to icon glyphs
<chris_> s/icon/isolate
<krit> Tavmjong: from the implementation side... no one actually implemented breaking shaping
<krit> Tavmjong: InkScape breaks shaping
<krit> Tavmjong: but Tango does have an option to send an entire paragraph so that it understands the context. So it is just a matter of using this functionality in InkScape.
<krit> myles: So each caharcter gets rendered in isolation in InkScape?
<krit> Tavmjong: right now.
<krit> myles: WebKit does the exact same thing and is wrong. It should be fixed.
<krit> Tavmjong: agre
<krit> krit: What about Blink
<krit> myles: don't know.
<krit> Tavmjong: would be great to have some tests.
<krit> myles: I can submit some tests
<krit> myles: I think the resolution would be: characters in a single element are shaped in a unit
<krit> Tavmjong: I think we need to be stronger
<krit> Tavmjong: characters in any text element that need shaped.
<krit> Tavmjong: like a span with multiple colors
<krit> myles: we can make it stronger.
<krit> myles: All characters inside a text element should be shaped as if they are one unit
<krit> RESOLUTION: All characters inside a text element should be shaped as if they are one unit
css-meeting-bot commented 4 years ago

The SVG Working Group just discussed Text i18n issues.

The full IRC log of that discussion <heycam> Topic: Text i18n issues
<heycam> AmeliaBR: if you're using SVG attributes to lay out individual characters, many browsers create the effect that these are the isolated character forms, which are not useful if tweaking individual characters in a word
<emilio> ScribeNick: emilio
<r12a> q+
<emilio> AmeliaBR: resolution was that the desired behavior is that if you have a text element the shaping of the letter forms should behave as if it was a paragraphs or text even if you tweak the character positions with svg
<emilio> ... so for actually rendering it you need to actually do some tweaks
<emilio> ... most text rendering you do it by a zwj on the other side of the shaper
<emilio> myles: that's not quite how it works but what you describe is ok
<emilio> heycam: what gecko does and I wrote in the spec is that each <text> is a <div> and the <textSpan> are <span> is shaped as if a <div>
<emilio> s/textspan/tspan
<emilio> ... then you lay them out, and all the positioning stuff is applied to individual grapheme clusters
<emilio> ... we only apply the positioning it is the first dom character of a grapheme cluster
<emilio> addison: so you're positioning the grapheme clusters individually right?
<emilio> myles: after shaping
<emilio> heycam: yes
<emilio> AmeliaBR: so you can separate the clusters into the screen after shaping?
<emilio> myles: yes
<emilio> AmeliaBR: so for impls that don't do that and pretend that the different chunks are independent
<emilio> q?
<emilio> ack r12a
<emilio> r12a: today I asked 3 iranian people and 2 arabic speakers what to do here
<emilio> ... [whiteboard time]
<emilio> ... I asked whether they would ever put a joint word aligning each of them vertically differently
<emilio> ... and their answer was yes
<emilio> ... but that they would show up in isolated form, not joint
<emilio> ... that's the opposite of what behnam (?) said
<emilio> ... because that's what you'd do in crosswords and games
<emilio> ... one of the iranian people referred me to a persian professor in iran
<emilio> myles: are you confident enough on this data or should we wait more
<emilio> r12a: the people I spoke with seemed pretty confident
<emilio> addison: does this also position multi-character segments?
<emilio> AmeliaBR: yes, you can do that
<emilio> nmccully: when they're joint in arabic you stretch when you go in the inline direction
<emilio> ... but it seems you're confident that if you move them vertically you shape them separately?
<emilio> ... anyhow there is a place for automatic place shaping
<emilio> r12a: this is like letter-spacing if you only do it at the horizontal boundary so that may be appropriate
<emilio> AmeliaBR: it puts the same spacing between every glyph pair but not for stuff in between
<emilio> r12a: so I didn't ask for this use case...
<emilio> heycam: my feeling is that this feature is poorly thought-out
<emilio> ... and I was introduced into svg because you could do the same on postcript
<emilio> ... where it is simple because that works on glyph
<emilio> ... so when I thought about it and wrote the spec I tried to do the simple thing that did some amount of sense
<emilio> ... so the question is is this feature as specified sufficient for the use cases?
<emilio> r12a: it'd depend on the authors in the end
<emilio> AmeliaBR: sounds like what we need is some sort of property to toggle "use isolated forms vs. connected forms"
<emilio> r12a: even with the joining forms if you tear them apart enough without a proper kashida you're going to get gaps which is undesirable
<emilio> heycam: so there's no existing css feature for that
<emilio> ... even required ligature
<emilio> AmeliaBR: you can't turn of required ligatures
<emilio> addison: you could use zws or something
<emilio> r12a: and you may need to do it even in actual cases like the ?? and the 5, where you use a joining form even at the end of the sentence
<emilio> myles: so we have two use cases, one is for kerning between letters for which we don't want the isolated form, another is for crossword like stuff for which we do
<emilio> ... I don't think people are doing this right now
<emilio> ... so we could advocate for new content to use different <text> elements
<emilio> addison: then they'd draw as isolated regardless of vertical spacing
<emilio> AmeliaBR: a11y would read them as separate letters...
<emilio> myles: that's what you want for the crossword stuff
<emilio> r12a: given the current state I don't think we have a clear answer
<emilio> ... for the case of vertical differences what I'm hearing we should use isolated forms
<emilio> ... except behnam which says the opposite
<emilio> myles: So if I have <text> with 5 chars and x="1 2 3"
<emilio> ... the proposal would be that for the character that have specific x position (the first three)
<emilio> ... you'd have isolated forms
<emilio> ... the rest would join
<emilio> heycam: my concern implementation wise is that now I need to resolve the positions before shaping
<emilio> ... and tell the shaper whether to use isolated form which would not be very easy
<emilio> ... I wonder if there's context other than svg where you want to get the isolated forms
<emilio> ... so you can add a css property or something
<emilio> ... and the author could opt-into it
<emilio> AmeliaBR: that'd be my preference
<emilio> ... but I don't know if there's enough use cases to get implementors on board
<emilio> r12a: I don't see the use case for such a property for horizontal text
<emilio> ... for vertical text... it depends
<emilio> ... for arabic layout requirements there's behdam's example about first letter, but there's people I've talked with which have never seen that (not clear if due to technology limitations)
<emilio> AmeliaBR: so are we likely to get any resolution? If not? more feedback and from who?
<emilio> r12a: yes
<emilio> addison: this is something we'd want to continue to study
<emilio> ... there's more people we need to ask
<emilio> krit: so unspecified for svg2 until we have a solution for it?
<emilio> AmeliaBR: I think we should try to make the edits
<emilio> myles: for what?
<emilio> r12a: this works fine for latin so maybe drop a note that it may misbehave in arabic?
<emilio> AmeliaBR: it may also be a problem with cursive latin fonts
<emilio> ... where it's not clear if the context remains there
<emilio> heycam: you can control it with opentype features
<emilio> AmeliaBR: for non-required ligatures you can turn it off
<emilio> heycam: I think we should add a "beware" note to the spec for arabic
<emilio> r12a: more than one
<emilio> (more than one script)
<emilio> myles: they should talk to us if they have use cases
<emilio> ... another option is to say that this only works in some languages
<emilio> heycam: so letter-spacing does cause isolated forms to be used?
<emilio> AmeliaBR: I don't think it affects glyph selection
<emilio> r12a: csswg also throws up their hands up abut how that should work
<AmeliaBR> CSS advice on letter-spacing and cursive scripts: https://drafts.csswg.org/css-text-3/#cursive-tracking
<emilio> ... you usually don't do same spacing for all glyphs
<emilio> ... use-cases are justifications
<emilio> [side conversations about how to justify text in svg]
<emilio> github: https://github.com/w3c/svgwg/issues/631