w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 667 forks source link

[css-overflow] Line-clamp and approaches to ellipsis insertion #10844

Open frivoal opened 2 months ago

frivoal commented 2 months ago

This relates to https://github.com/w3c/csswg-drafts/issues/7708 about the value of the differences between the continue: discard and continue: collapse approaches to line-clamp, but looks into a narrower question.

One of the ways line-clamp is deliberately different in the spec from text-overflow as well as from the legacy -webkit-line-clamp implementations is the way the ellipsis is inserted: where does it go, what happens to the underlying text. The differences are particularly important in the case of bidi text, though not limited to it. Alternative designs like the continue: collapse proposal drafted by @andreubotella in response to the feedback from @bfgeek or @emilio have not retained that difference, and from conversations, it is not clear to me that this is deliberate, so I want to explore that aspect explicitly.

text-overflow has taken simple approach to ellipsis insertion: hide all the characters that overflow the line, plus a few more next to them at the physical end of the line to make room for the ellipsis, then paint it there. (There are nuances between the Chrome/webkit and the Firefox approaches, but I don't believe those nuances are relevant to the point I want to make here, so I'll skip over those details.) Legacy implementations of -webkit-line-clamp reuse that logic.

In my view, this design is appropriate for text-overflow. text-overflow is concerned with a single line overflowing a scroll container in the inline direction. It leads to perfectly reasonable behavior when you actually scroll the box (in browsers that support that): more text is progressively revealed, everything makes sense.

Taking an example of RTL text with some LTR text in it:

If the box is too small and text-overflow: ellipsis applies, you get this:

As you scroll, more content gets revealed:

All good.

However, the situation for line-clamp and block ellipsis is not the same, and I don't think this approach works well. There is no content the overflows the line, so "hide all the characters that overflow the line, plus a few more next to them at the physical end of the line" doesn't make sense. you could still hide enough characters from the physical end of the line to make room for the ellipsis, but that ends up being counter intuitive to the reader. Think about the text that isn't visible: what is it? where is it? In the scrolling text-overflow case, it's the text at the (left) end of the line, and it's to be found by scrolling (towards the left). We can reason in spatial terms. But in the line-clamp, the text hidden is the text of the next line(s). That text is the continuation in logical order of the text you can see.

Let's look at an example:

Let's clamp it to 3 lines, using the approach used by the legacy -webkit-line-clamp implementations.

Note that due to how the truncation happens, both the logical continuation of the text (“please!”, and the next line) and the logical start of the LTR fragment which is at the physical end of the line (“Ca’) are missing. This is confusing. Try for instance to think about what you'd show to the reader if you wanted to provide some affordance to see the hidden text, like a tooltip or something. Regardless of how you achieve that (js based, using continue:fragments, whatever), what would you even put in that tooltip? The best you can do is probably something like this, but that's rather strange.

Instead, the approach specified for line-clamp removes content from the logical end of last line before the clamp to make room for the ellipsis.

Here, in the reader's mind, there's no confusion about what part of the text is getting hidden: the next part, conceptually pushed to the next line if it were to exist. And if you were to show it in a tooltip, nothing would be weird (I've put an ellipsis in the tooltip, but that's a design choice that could be questioned, not inherent to the mechanism).

One might argue about the placement of the ellipsis on the last line. Here, as per spec, its position is determined by the direction of the surrounding block, which I'd argue is the right thing to do. But even if we instead decided to place the ellipsis according to the direction of the elided text, we'd still get sensible results, as we're still eliding in logical order.

Because of that, I continue to believe that for line-clamp (and its compatibility variant of -webkit-line-clamp) we should stick to specified approach of eliding content in logical order on the line that gets the ellipsis (regardless of whether we follow an underlying continue: discard or continue: collapse approach).


As a follow up, working in logical order makes it a lot easier to also hook into line-breaking rules to decide the granularity of content that gets elided, which is how it is currently specified, and how I've done it in the examples above. Author feedback like this blog highlighted how eliding at arbitrary points could lead to awkward breaks in the middle of words, possibly leading to misunderstandings. Hooking into line breaking rules avoids this as it elides content word by word (for languages that use word-based line breaking), making it no more confusing that a regular line break. And it lets us opt into eliding up to hyphenation points, for instance. This is also robust internationally: for instance for CJK, it would honor the rules about line-break prohibition as a direct consequence of using the line breaking code, without having to reinvent it.

In simple cases, you could also re-build this on top of the text-overflow: ellipsis approach, but it would run into difficulties quickly as soon as bidi gets involved. For instance, consider a mixed Arabic/Japanese case:

With a clamp of 1 line, the legacy approach does this (with a illustrative tooltip added):

Not only does this suffer from the same problem as earlier, where both the start and end of the Japanese fragment get elided, but in addition this separates the "じ" from the "ゃ" (and the "、"), which Japanese line breaking wouldn't do (as that distorts pronunciation). Solving that isn't a simple invocation of the usual line breaking code, as this is happening in the logical middle of the line and removing the logical start of that fragment, rather than at the logical end of the line and removing the logical end, as line breaking normally does.

Instead, following the specified approach and eliding content from from logical end, using usual line breaking rules, gives the more expected result:

or this one, if we place the ellipsis based on the direction of the elided content rather than of the block:

Even if we chose not have elision based on wrapping opportunities for now, or to have opt-ins or opt-outs for letter-by-letter elision, logical order elision makes it easier to open these possibilities.

So, as in the earlier part, I continue to believe that for line-clamp (and its compatibility variant of -webkit-line-clamp) we should stick to the line-breaking approach currently specified when it comes to how much content gets elided (regardless of whether we follow an underlying continue: discard or continue: collapse approach).

aphillips commented 2 months ago

In the 2024-09-12 I18N telecon, I was actioned to reply to this issue.

I think we assumed logical order elision. We think the right approach would be to place an RLM or LRM after the ellipsis when eliding text. The mark inserted would have the same directionality as the next (omitted) character. I am adding this issue to our agenda for Tuesday, 2024-09-17's I18N+CSS call.

andreubotella commented 1 month ago

In the 2024-09-12 I18N telecon, I was actioned to reply to this issue.

I think we assumed logical order elision. We think the right approach would be to place an RLM or LRM after the ellipsis when eliding text. The mark inserted would have the same directionality as the next (omitted) character. I am adding this issue to our agenda for Tuesday, 2024-09-17's I18N+CSS call.

Since in implementations the ellipsis insertion would happen as part of line breaking, could this be expressed in terms of which base embedding level the ellipsis isolate would have?

css-meeting-bot commented 1 month ago

The CSS Working Group just discussed [css-overflow] Line-clamp and approaches to ellipsis insertion, and agreed to the following:

The full IRC log of that discussion <TabAtkins> florian: As you might remember, line-clamp has been specced for a few years based on fragmentation
<TabAtkins> florian: the longhand that triggers that is called continue:discard
<TabAtkins> florian: andreu has been working on an alternative not based on fragmentation, which we're calling continue:collapse
<TabAtkins> florian: aside from the primary diff between these two, one thing was deliberately designed into continue:discard and not discussed much, and apparently assumed away in continue:collapse, I wanted to talk about it
<TabAtkins> florian: on the line you add the ellipsis, how
<TabAtkins> florian: the pre-existing ellipsis mechanism is from text-overflow:ellipsis
<TabAtkins> florian: I argue that's not appropriate here
<TabAtkins> florian: that's for a single line, which is overflowing the line box. you chop it off at the end of the box, and then some, just enought to add the ellipsis char
<TabAtkins> florian: this is a physical operation, it makes sense in that context
<TabAtkins> florian: but in line-clamp we're not overflowing
<TabAtkins> florian: the logical way to remove content in this context is to act like linebreaking. you basically have a line that's a little too short, so you put a little less content in it
<andreubotella> q+
<TabAtkins> florian: this means you chop off from the *logical* end of the line, not the physical end
<TabAtkins> florian: if you don't do this, you'll chop the logical *middle* of the line, missing *two* parts of the text
<TabAtkins> florian: the following lines, and the logical middle of your line
<TabAtkins> florian: if you're trying to read the text, it makes no sense
<TabAtkins> florian: it's also weird if you split in the middle of words
<TabAtkins> florian: dropping at the same boundaries as line wrapping allows you significantly reduces the amount of misunderstandings you can get from chopping text
<TabAtkins> florian: kind of secondary, but still
<TabAtkins> florian: the issue is a bunch of pictures of text asking how weird it is to chop physically rather than logically
<TabAtkins> florian: so what I want is just to establish whether, regardless of discard or collapse, we should indeed preserve this "just do line breaking and break logically" behavior
<TabAtkins> florian: also, now that we know how much stuff to remove for the ellipsis, where do we put it? physical end or logical end of the line?
<TabAtkins> florian: but that's only relevant if we agree on the first
<TabAtkins> addison: i18n discussed this at length recently
<TabAtkins> addison: we're fairly strongly supportive of logical string truncation
<TabAtkins> addison: it means the kept part of the string makes sense
<TabAtkins> addison: and if you made the box wider, the removed text would be at the end of the line and would make sense to come in
<TabAtkins> addison: So we think logical truncation makes the most sense. and is the easiest here, because you find a break point and then proceed, rather than doing a bunch of ranges
<astearns> ack andreubotella
<TabAtkins> andreubotella: currently it's implemented the same as text-overflow:ellipsis in Chrome, after bidi reordering
<TabAtkins> andreubotella: I don't oppose doing what Florian said, my initial impl was just following existing chrome ellipsis behavior
<TabAtkins> andreubotella: but having heard the bidi reasoning, that sounds fine
<emilio> q+
<astearns> ack dbaron
<TabAtkins> andreubotella: might offer some impl challenges, but don't thi8nk they're insurmountable considering the benefit
<TabAtkins> dbaron: my intuition about the logical truncation is that it's sort of like you're doing layout into a shorter line (shrunk by the ellipsis)
<TabAtkins> dbaron: that's one way to think about it
<TabAtkins> dbaron: one subtlety is how you place the ellipsis, whether it's good to be closer to the text or further after doing this shorter-line layout
<TabAtkins> dbaron: so flush with the remaining text, versus flush with the line-end
<addison> q+
<TabAtkins> dbaron: I think I reasonably strongly agree with the consensus on logical truncation, but don't have a strong opinion on placement of the ellipsis
<TabAtkins> dbaron: it's interesting in both cases from a layout perspective
<TabAtkins> florian: addison had an interesting comment about that
<astearns> ack emilio
<TabAtkins> emilio: so we need to agree on ellipsis insertion not messing with following lines layout
<andreubotella> q+
<TabAtkins> florian: in the continue:collapse that's true, but discard it wouldn't
<TabAtkins> andreubotella: depends on how it's implemented
<TabAtkins> andreubotella: in chromium I don't think there's a way to preserve the breakpoint after a linebreak
<TabAtkins> andreubotella: so the way I was planning to go was to do it all at linebreak time, then the displaced content is pushed to the next line
<TabAtkins> andreubotella: but that's not the same as the impl I think emilio is thinking of
<TabAtkins> emilio: it would be silly because you'd still have to do the layout of everything past the break point
<TabAtkins> andreubotella: my impl wouldn't be doing the extra work
<TabAtkins> emilio: so you'd insert it during the first layout pass?
<TabAtkins> andreubotella: yes, tho with line-clamp:auto you'd still get two layouts
<TabAtkins> emilio: I dont' thi8nk we want line-clamp to cause multipass layout
<TabAtkins> andreubotella: you need it for auto, that's "as many lines as fit". not purely a paint-time info
<TabAtkins> andreubotella: unless it is in gecko?
<TabAtkins> emilio: you clamp the inner block and outer block, it's one pass
<astearns> ack addison
<andreubotella> q-
<TabAtkins> (I don't understand any of this, hopefully it makes sense)
<TabAtkins> addison: we talked about positioning in the WG convo
<TabAtkins> addison: our thinking is that the ellipsis generally wants to behave as part of the text
<TabAtkins> addison: the trick is you probably want to provide the equivalent of a strong marker that matches the next inked character's direction
<Bert> q+ to ask how you know what the language/direction of the ellipsis text is.
<andreubotella> q+
<TabAtkins> addison: you need to know if the run is gonna break or not. ellipsis is a neutral character, you want it to jump to the left or right side of the text, according to where the bidi algo would have put the ellipsis in if was actually in the text
<TabAtkins> florian: give it directionality of the thing it replaces
<TabAtkins> fantasai: no, directionality of paragraph
<TabAtkins> florian: using what it replaces intrigues me
<TabAtkins> addison: you don't want it isolated, you want to let bidi just do the job of deciding left or right side
<TabAtkins> addison: the characters just before it only provide half the context
<TabAtkins> fantasai: you also have to extract it from any embeddings
<TabAtkins> fantasai: two approaches you can take
<TabAtkins> fantasai: one is it acts like the preceding char, and attaches to that.
<TabAtkins> fantasai: dont' think we want that
<TabAtkins> fantasai: other is that it belongs to the paragraph
<TabAtkins> fantasai: we're not ellipsizing a particular phrase, we're ellipsizing the whole paragraph, so having it take the paragraph's qualities makes sense
<TabAtkins> fantasai: it also helps show it's a continuation of the paragraph, not the single phrase
<TabAtkins> addison: that's potentially fair, I did some playing with this
<TabAtkins> addison: it kinda behaves normally
<TabAtkins> addison: I think it's worth going offline and writing it down better, describing why it's a good proxy
<TabAtkins> addison: but it is possible that what you're saying it the right answer
<astearns> ack fantasai
<TabAtkins> fantasai: the fundamental questions are logical or physical clipping (I think logical)
<TabAtkins> fantasai: and whether the ellipsis is attached to the immediately preceding character or an attribute of the paragraph itself, and from that the rest falls out
<TabAtkins> florian: I support opening a followup for this. My starting position matched fantasai, but addison's remarks made me think there were subtleties
<TabAtkins> fantasai: since we're breaking form linebreaking proeprties, not per-character proeprties
<TabAtkins> fantasai: it seems like a more natural break, not as strongly attached to the previous character as you would if you broke in the middle of a word
<astearns> ack Bert
<Zakim> Bert, you wanted to ask how you know what the language/direction of the ellipsis text is.
<TabAtkins> Bert: might be different issue, from what I understand it's not always just an ellipsis character, but can be something else
<TabAtkins> florian: the spec has an answer for that, whether that answer is what we want is another question
<TabAtkins> Bert: okay. if it's an ellipsis you can say it's neutral, but if it's something else...
<TabAtkins> fantasai: I also think that case, if it's like "continues on page 25", if you break in the middle eof hot pink small text, but paragraph is generally normal sized black text, you want the "continues on" to follow the paragraph's styling.
<TabAtkins> fantasai: and directinality
<TabAtkins> fantasai: the ellipsis probably needs to be isolated, but it should take the paragraph's directionality
<astearns> ack andreubotella
<TabAtkins> andreubotella: I don't have an opinion on this discussion, but if we end up going with "matching the next/prev character", anything other than the paragraph base directiond
<TabAtkins> andreubotella: I think in the spec it should be specified in terms of the bidi level of the ellipsis character or replacement string
<TabAtkins> andreubotella: Becuase if you're breaking as part of line-breaking, or even afterwards, you wouldn't be placed back further than the bidi algorithm's progress. you'd at least have the levels of the shown characters.
<TabAtkins> andreubotella: this follows, should be the same as adding a LTR Mark character
<TabAtkins> andreubotella: so I think it should be specced to match impls in that degree
<TabAtkins> emilio: agree
<TabAtkins> emilio: I think we should at least agree to not do the physical truncation the naive way
<TabAtkins> emilio: there are multiple ways to do this that makes sense
<TabAtkins> emilio: imagine you don't have the ability to redo the line layout at that tpoint
<TabAtkins> emilio: we should still truncate at a point that makes sense, might mean truncating more text than what you originally needed.
<TabAtkins> emilio: but that's still way better than putting the ellipsis at the physical end
<TabAtkins> emilio: so I think we should at least agree on that, not naive physical clipping where you just remove characters until it fits
<TabAtkins> florian: I agree, but I'm not sure what you're proposing is "enough"
<TabAtkins> emilio: trying to start from a point nobody disagrees
<TabAtkins> emilio: I also feel strongly we shoudln'ta effect layout of following lines, to require no relayout
<astearns> ack fantasai
<TabAtkins> q+
<TabAtkins> fantasai: one possible resolution is you remove up to a valid linebreaking point, not less than that
<TabAtkins> florian: in logical space? or is that a separate resolution?
<TabAtkins> fantasai: separate. Just saying we remove chunk in linebreaking chunks
<xiaochengh> q+
<TabAtkins> addison: aren't those related? can you compute a linebreaking opportunity physically?
<TabAtkins> fantasai: you can't remove a single letter from English, for example
<fantasai> TabAtkins: Emilio, wondering about implications of what you were saying
<fantasai> TabAtkins: seems you do line layout, figure out lines
<fantasai> TabAtkins: on final line, truncate somehow and put ellipsis in
<fantasai> TabAtkins: and missing words don't flow into the next line
<fantasai> TabAtkins: they are just hiding
<fantasai> emilio: right, don't relayout past breaking point
<florian> q+
<astearns> ack TabAtkins
<TabAtkins> andreubotella: Not sure if we should decide today, it has implications on impl
<TabAtkins> emilio: yeah, just wanted to resolve on basic constraints first, like we don't trigger multipass layout
<TabAtkins> emilio: you'd expect clamping to make the layout cheaper, not more expensive
<TabAtkins> andreubotella: i'm not sure if in blink you can drop lines after layout
<TabAtkins> emilio: I think you can, you just don't paint them
<TabAtkins> florian: regardless, I think we should ellipsize in logical order
<Bert> q+ to ask if 'line-clamp: "\0A …"' works to put the ellipsis on a separate line? And if so, can I right-align that line?
<astearns> ack xiaochengh
<TabAtkins> xiaochengh: I agree we should do logical clamping
<TabAtkins> xiaochengh: but seems complicated, no one size fits all. might have different fonts or colors on ellipsis
<TabAtkins> xiaochengh: so I think we need a pseudo-element to style the ellipsis
<TabAtkins> florian: eventually yes, I don't think it's needed as a level 1 feature
<astearns> ack florian
<TabAtkins> xiaochengh: with a pseudo-element we could postpone some of the issues, like where to put the ellipsis (flush with text, or flush with line end)
<TabAtkins> florian: maybe with pseudo we could override some of the decisions, but we'd still need a default answer
<astearns> ack Bert
<Zakim> Bert, you wanted to ask if 'line-clamp: "\0A …"' works to put the ellipsis on a separate line? And if so, can I right-align that line?
<TabAtkins> xiaochengh: yeah but at least we wouldn't have to bikeshed too much since it wasn't as fatal to tet it wrong
<TabAtkins> astearns: no, defaults are still very important
<TabAtkins> Bert: ah, xiaocheng's suggestion just gave me an answer to what happens with the styling of the ellipsis
<TabAtkins> astearns: can we resolve to have line-clamp remove characters at break opportunities in logical order?
<TabAtkins> florian: Knowing there will be follow-up discussions.
<TabAtkins> astearns: concerns?
<TabAtkins> emilio: we can add a "barring impl craziness"
<TabAtkins> emilio: I think everyone agrees this is the right mode
<TabAtkins> florian: yeah, if it ends up impossible we'll have to revisit
<TabAtkins> astearns: so looks like agreement that it's necessary, and if impl blocks it, we'll still need some *third* thing
<TabAtkins> astearns: objections?
<TabAtkins> RESOLVED: Remove characters *at* break opportunities *in* logical order, for line-break
<TabAtkins> florian: so I think we e need to still figure out alignment of the ellipsis, and then andreu and emilio argue about what happens with the remaining stuff on the line
<TabAtkins> emilio: okay