w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 661 forks source link

[css-text] `&ncsp;` - Non-Collapsible Space #10821

Open dgp1130 opened 2 months ago

dgp1130 commented 2 months ago

HTML whitespace collapsing behavior makes it very difficult to directly manage individual spaces rendered in an HTML document. Whitespace collapsing has a few problems today:

  1. Content management systems cannot just blindly pass text written by a non-developer through to an HTML rendering context. If a user types Hello, World in a CMS system, it will be rendered as Hello, World in HTML. The CMS can't really do anything about this to get the document to render as intended by the author without forcing all strings to use white-space: pre;.
  2. white-space: pre; applies to the whole element, but the user might want two spaces in a particular place within that element. For example, maybe I'm one of those people who insists on two spaces at the end of every sentence.
  3.   sounds like an easy fix for "Just add a space here", but forces non-breaking and a forced-width behavior the user may not want (when line wrapped,   always takes up one space of width, even when that's not needed). See the line wrapping behavior of this demo as the viewport shrinks.
  4.   is the unnamed entity for a single space, which sounds like it would be a good alternative given that I really want the behavior for a simple space. However,   is also subject to whitespace collapsing so it can't really solve this kind of problem. This feels especially weird since whitespace collapsing exists as an affordance to developers who want to format their HTML source code differently from the rendered output, but any developer who writes   clearly wants that space to be rendered as-is. Collapsing those spaces goes directly against developer expectations.

This is a shorter, more focused breakdown on some of the problems with whitespace collapsing, however I wrote a whole blog post digging deeper into the complexity at play here which motivated this particular issue and gives additional context and motivation.

I propose a new &ncsp; entity which is treated identically to a regular space, however is not subject to whitespace collapsing. You could write Hello,&ncsp;&ncsp;&ncsp;&ncsp;&ncsp;World and actually get multiple adjacent spaces like Hello, World. This would address the above challenges because:

  1. Content management systems can output &ncsp; where white space is significant.
  2. Developers could use &ncsp; without having to opt-in an entire element into the rules of white-space: pre;.
  3. &ncsp; could serve as a drop-in replacement for existing usages of   but present better line wrapping behavior.
  4. &ncsp; would work in the way I wish   worked and behave in a more predictable fashion.

I'm not 100% which standards body is the right one to own this particular issue, however I filed it here because whitespace collapsing seems to be a part of the css-text standard. In theory, &ncsp; is just an alias for a standard space, one which just gets ignored by the whitespace collapsing algorithm. However, as I understand it based on the current layering of browsers, the HTML parser would resolve &ncsp; into a regular space character, and it would then be impossible for the CSS to disambiguate spaces which originated from &ncsp; entities. Therefore I suspect this would actually require a brand new Unicode character. If one were added for this purpose, then the existing css-text standard likely wouldn't need to be changed at all. However, it feels weird to add one solely to solve an HTML issue like this and I'm not sure that's feasible. What exactly would a non-collapsible space do in a non-HTML context? An alternative approach might be for the HTML parser to convert &ncsp; to a different, user-space Unicode character known by the white-space spec which gets rendered as a standard space but is otherwise ignored for collapsing. I'm not sure that's a great architectural idea, but its the one solution which comes to mind here.

Feel free to move this issue to whichever standards body makes the most sense to evaluate it.

Loirooriol commented 1 month ago

forcing all strings to use white-space: pre;

Rather than pre, you probably want pre-wrap, and possibly combine it with white-space-trim.

the user might want two spaces in a particular place within that element

Do you mean that you want some sequences of spaces to collapse, but not others? Why not just preserve all spaces and use a single one instead of multiple wherever you want to see a single one?

maybe I'm one of those people who insists on two spaces at the end of every sentence

And you want to see both or one?

any developer who writes clearly wants that space to be rendered as-is

When I write   I want it to behave like a normal U+0020 space.

I'm not 100% which standards body is the right one to own this particular issue

And I don't think they will find this much appealing...

Crissov commented 1 month ago

Yeah, see https://github.com/whatwg/html/issues/5121 and https://github.com/whatwg/html/issues/7071 for instance.

dgp1130 commented 1 month ago

Rather than pre, you probably want pre-wrap, and possibly combine it with white-space-trim. Do you mean that you want some sequences of spaces to collapse, but not others? Why not just preserve all spaces and use a single one instead of multiple wherever you want to see a single one?

Preserving all spaces in an element is a larger change than strictly necessary to preserve only a particular set of spaces within it and forces the developer to compromise other aspects of how they write their HTML, such as eliminating any indentation or newlines depending on the specific white-space value they are using and the trade offs that requires. &ncsp; would not require developers to make that kind of compromise. They could use any white-space value and just choose to keep any arbitrary space within the element.

Also as mentioned, CMS tools generally can't rely on any specific styles being applied to an element, so any CSS requirement feels like a non-starter for that use case.

maybe I'm one of those people who insists on two spaces at the end of every sentence And you want to see both or one?

In this example, I would want to see both spaces. If the developer types Good new everyone! HTML supports non-collapsible spaces now. (note two spaces) they will naturally be collapsed into a single space when rendered in HTML. &ncsp;&ncsp; would allow both spaces to be preserved, regardless of the rest of the content or styling for that element.

Yeah, see https://github.com/whatwg/html/issues/5121 and https://github.com/whatwg/html/pull/7071 for instance.

Thanks for pointing that out, I wasn't aware new HTML entities were so difficult to add. I agree this is unlikely to meet the impact needed to justify its addition. An unnamed entity could technically address the same purpose, though I imagine it would be significantly harder to convince the community to prefer an unnamed &ncsp; character over   for those mis-use cases.

It seems I was at least correct that this would require a new Unicode character, but it sounds like it would make more sense to file this in https://github.com/whatwg/html? I imagine we'd need at least some amount of consensus there among HTML stakeholders that this is worth pursing before there would be any hope of getting Unicode on board.

xiaochengh commented 1 month ago

Not the ideal solution, but you can simulate it by interleaving no-break spaces and zero-width spaces:

foo​ ​ ​bar

On Sat, Sep 7, 2024 at 5:14 PM Douglas Parker @.***> wrote:

Rather than pre, you probably want pre-wrap, and possibly combine it with white-space-trim. Do you mean that you want some sequences of spaces to collapse, but not others? Why not just preserve all spaces and use a single one instead of multiple wherever you want to see a single one?

Preserving all spaces in an element is a larger change than strictly necessary to preserve only a particular set of spaces within it and forces the developer to compromise other aspects of how they write their HTML, such as eliminating any indentation or newlines depending on the specific white-space value they are using and the trade offs that requires. &ncsp; would not require developers to make that kind of compromise. They could use any white-space value and just choose to keep any arbitrary space within the element.

Also as mentioned, CMS tools generally can't rely on any specific styles being applied to an element, so any CSS requirement feels like a non-starter for that use case.

maybe I'm one of those people who insists on two spaces at the end of every sentence And you want to see both or one?

In this example, I would want to see both spaces. If the developer types Good new everyone! HTML supports non-collapsible spaces now. (note two spaces) they will naturally be collapsed into a single space when rendered in HTML. &ncsp;&ncsp; would allow both spaces to be preserved, regardless of the rest of the content or styling for that element.

Yeah, see whatwg/html#5121 https://github.com/whatwg/html/issues/5121 and whatwg/html#7071 https://github.com/whatwg/html/pull/7071 for instance.

Thanks for pointing that out, I wasn't aware new HTML entities were so difficult to add https://github.com/whatwg/html/blob/main/FAQ.md#html-should-add-more-named-character-references. I agree this is unlikely to meet the impact needed to justify its addition. An unnamed entity could technically address the same purpose, though I imagine it would be significantly harder to convince the community to prefer an unnamed &ncsp; character over   for those mis-use cases.

It seems I was at least correct that this would require a new Unicode character, but it sounds like it would make more sense to file this in https://github.com/whatwg/html? I imagine we'd need at least some amount of consensus there among HTML stakeholders that this is worth pursing before there would be any hope of getting Unicode on board.

— Reply to this email directly, view it on GitHub https://github.com/w3c/csswg-drafts/issues/10821#issuecomment-2336454029, or unsubscribe https://github.com/notifications/unsubscribe-auth/AET4OW5KSZXXZ66F4J2RHT3ZVNUCRAVCNFSM6AAAAABNQ5OWXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZWGQ2TIMBSHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dgp1130 commented 1 week ago

I don't think interleaving no-break and zero-width spaces quite works because no-break spaces have a forced width, meaning they always take up space even when line wrapping would not require it. See point 3 of the original comment, that issue still exists with an interleaving approach.