whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.18k stars 2.69k forks source link

Spec needs to make it clear that paragraphs are isolated from surrounding paragraphs, for the purposes of bidirectional text formatting. #3905

Open Zhang-Junzhi opened 6 years ago

Zhang-Junzhi commented 6 years ago

Here's the definition of a paragraph in the spec:

A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.

It doesn't mention anything about bidirectional text formatting in the above definition. I think the spec needs to make it clear that paragraphs are implictly isolated from surrounding paragraphs, for the purposes of bidirectional text formatting. This is indeed semantically meaningful and changes the default semantics for HTML. For example, consider the following code:

<li id="foo" role="menuitem">...</li><li id="bar" role="menuitem">...</li>

Semantically speaking, without applying a bidi isolation between paragraphs, the bidirectional text formatting works across the two lis.

However, if spec stated that a bidi isolation is implictly applied between paragraphs, then the code above would be essentially the same as the following code(as if a bdi were implictly added in a boundary of a paragraph):

<li id="foo" role="menuitem"><bdi>...<bdi></li><li id="bar" role="menuitem"><bdi>...</bdi></li>

I'm pretty sure the latter one is what the spec initially intended to mean.

Zhang-Junzhi commented 6 years ago

Just as a side note, from the CSS perspective, if implicit isolation is applied between boundaries of paragraph contexts, then a designer designing inlinified <li>s with content-generated text should write: unicode-bidi: isolate together with display: inline(let's say if the designer is doing things correctly).

But the spec doesn't mention implicit bidi-isolation is applied between boundaries of paragraph contexts, then display: inline alone, without unicode-bidi: isolate, is at least technically correct for the HTML's semantics, even though it may get undesirable visual results.

annevk commented 6 years ago

Doesn't this follow from the bidi CSS properties?

Zhang-Junzhi commented 6 years ago

I didn't find any evident statements that claim a paragraph is semantically isolated for the purpose of bidirectional text formatting, but I am happy to be proven wrong.

annevk commented 6 years ago

I don't think it's needed since text rendering is left to CSS entirely, which handles this.

Zhang-Junzhi commented 6 years ago

I don't think it's purely just a matter of rendering, different bidirectional text does change the document semantics. An isolation means an additional level of indepedent bidi content, it has semantical effect as quotes.

For example, the two sentences mean differently:

I like "Red color". (Red color can be an artwork).

I like red color. (Here the red color is a color)

Zhang-Junzhi commented 6 years ago

The above color example is not a very accurate one, but is an attempt to give you a feeling that an isolation means an additional level of indepedent bidi content, so it's semantically meaningful, not a pure rendering issue.

Zhang-Junzhi commented 6 years ago

I just came up with a better example:

His reply is short is short.

The above sentence can be interpreted two ways:

"His reply is short" is short. (Meaning "His reply is short" as a sentence is short) His reply is "short is short". (Meaning he said short is the fact that cannot be changed)

fantasai commented 5 years ago

@Zhang-Junzhi This is covered in https://html.spec.whatwg.org/multipage/rendering.html#bidi-rendering

@annevk You might want to make the rendering behavior associated with that section required for any UA that has any type of visual presentation of the HTML, not just ones that follow the suggested default rendering. There's a variety of valid ways to interpret how a P or RT is rendered, but the bidi formatting requirements shouldn't be tailorable.

Zhang-Junzhi commented 5 years ago

@Zhang-Junzhi This is covered in https://html.spec.whatwg.org/multipage/rendering.html#bidi-rendering

@annevk You might want to make the rendering behavior associated with that section required for any UA that has any type of visual presentation of the HTML, not just ones that follow the suggested default rendering. There's a variety of valid ways to interpret how a P or RT is rendered, but the bidi formatting requirements shouldn't be tailorable.

@fantasai Thank you, this is a suggested style in Rendering section, so it has no nothing to do with semantics.

I agree the bidi formatting requirements shouldn't be tailorable. The definition of a paragraph should be semantically isolated around its neighbour text, and I think the spec needs to mention it in the definition of a paragraph.

annevk commented 5 years ago

It seems that requirement is already addressed by https://html.spec.whatwg.org/#requirements-relating-to-the-bidirectional-algorithm, no?

Zhang-Junzhi commented 5 years ago

It seems that requirement is already addressed by https://html.spec.whatwg.org/#requirements-relating-to-the-bidirectional-algorithm, no?

No, not really. I'd like to offer a PR for html spec to address this if I get time.

annevk commented 5 years ago

Okay, that might be helpful in clarifying what you think is missing. Thanks!

Zhang-Junzhi commented 5 years ago

@annevk I have created a PR for resolving this issue. #4338

fantasai commented 5 years ago

@annevk That section looks pretty correct to me. Maybe it's worth cross-referencing it from https://html.spec.whatwg.org/multipage/rendering.html#bidi-rendering ?

@Zhang-Junzhi Your PR breaks a lot of stuff. I don't think it's correct or helpful. Go read UAX9 top to bottom, then css-writing-modes-3's bidi section. You might also find https://www.w3.org/TR/html-bidi/ helpful, it provides the background for some of the stuff in the CSS and HTML and Unicode specs.

Zhang-Junzhi commented 5 years ago

@annevk That section looks pretty correct to me. Maybe it's worth cross-referencing it from https://html.spec.whatwg.org/multipage/rendering.html#bidi-rendering ?

This is what I tried to say, the suggested style is correct. but the spec seems missing the part about the corresponding semantics.

I guess the spec might mention somewhere else that non-phrasing contents automatically create implicit sembeddedness in bidi semantical sense(like inserting implicit \<bdi>).

Zhang-Junzhi commented 5 years ago

@Zhang-Junzhi Your PR breaks a lot of stuff. I don't think it's correct or helpful. Go read UAX9 top to bottom, then css-writing-modes-3's bidi section. You might also find https://www.w3.org/TR/html-bidi/ helpful, it provides the background for some of the stuff in the CSS and HTML and Unicode specs.

@fantasai I have read most parts of bidi articles, I don't think I have misunderstood that. I guess you meant it's not a good idea to let the concept of an element's "directionality" mix with the concept of embeddness, which can make the the word "directionality" itself easily misleading. Then Let me try somewhere else.

Zhang-Junzhi commented 5 years ago

As a reminder: a new patch has commited to the PR. #4338

annevk commented 5 years ago

@fantasai yeah, that seems like a very good idea.

domenic commented 5 years ago

Per the discussion here, it seems like the action item is to cross-link to https://html.spec.whatwg.org/multipage/dom.html#requirements-relating-to-the-bidirectional-algorithm from https://html.spec.whatwg.org/multipage/rendering.html#bidi-rendering, so leaving this open to track that.