whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.09k stars 2.66k forks source link

innerText: include parentheses around <rt> if there's no <rp> #1801

Closed zcorpan closed 7 years ago

zcorpan commented 8 years ago

https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute

The innerText getter has a special case for Text nodes that are children of rp elements; the text is included even though rp is 'display:none' by default.

Demo: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4488

This is nice but I think it is more common to omit rp and only use rt, and in that case it's not helping.

The rendering section has:

User agents that do not support correct ruby rendering are expected to render parentheses around the text of rt elements in the absence of rp elements.

https://html.spec.whatwg.org/multipage/rendering.html#phrasing-content-3

I think if we are going to special case ruby in innerText at all it would be good to make it "nice" also if rp is not being used, like in the rendering section.

Concretely, if a ruby element has no rp children, include "(" before rt children and ")" after.

cc @rniwa @rocallahan @jfkthame

Implementer interest:

rniwa commented 8 years ago

Yeah, falling back to parentheses when there is no rp makes sense to me.

kojiishi commented 8 years ago

cc @yosinch

zcorpan commented 7 years ago

PR for spec: https://github.com/whatwg/html/issues/2113 PR for wpt: https://github.com/w3c/web-platform-tests/pull/4259

zcorpan commented 7 years ago

@jfkthame is there interest to implement this in Gecko? @tkent-google is there interest to implement this in Chromium? @travisleithead is there interest to implement this in Edge?

jfkthame commented 7 years ago

@upsuper wdyt about this? Would you like to take it for gecko?

upsuper commented 7 years ago

There is an issue that, Gecko implements the ruby model from CSS Ruby spec, which is more complicated than that in the current HTML spec. The CSS model supports continuous <rt> elements as well as <rtc> element, which means the algorithm you proposed in #2113 wouldn't work for Gecko.

[Slightly offtopic: this proposal actually again highlights the defect of HTML's ruby model. This model fails to express words like " → 振り仮名(ふりがな)" in a reasonable way which has desired behavior on both rendering and plain text. HTML spec should really adopt the CSS Ruby model.]

There is also a question that whether the parentheses should be proportional or fullwidth (w3c/csswg-drafts#762). I think for CJK languages, majority of people would prefer either fullwidth parentheses or proportional parentheses with whitespace around. Maybe proportional parentheses are more desirable for other languages? Although it seems to me CJK languages (especially Japanese) are the main user of ruby.

Personally I don't like to see the algorithm of innerText becomes increasingly complicated. IIUC, it was speced this way for web compatibility, not really because of its distinct functionality (?). And thus I don't think it's worth adding anything to it unless for compatibility reason. I may be wrong about this.

zcorpan commented 7 years ago

Thanks @upsuper.

So with rtc, an rp might be a sibling of the rtc but not sibling of the rt. The algorithm could be changed to accommodate that, but first we should decide whether to do this at all.

I think fullwidth parentheses should be used if that is commonly used by CJK.

You are correct that innerText was added mainly for better Web compat.

I'm happy to drop the proposal if people think it is not worth it. My question then is, should we also drop the special handling of rp currently in the spec, which is implemented only in Gecko at the moment?

zcorpan commented 7 years ago

(The ruby model is issue #121.)

upsuper commented 7 years ago

So with rtc, an rp might be a sibling of the rtc but not sibling of the rt. The algorithm could be changed to accommodate that, but first we should decide whether to do this at all.

rtc, rt, and rp can be a sibling of each other. The rule to add parentheses could be complicated. There is an attempt in CSS Ruby spec for generating parentheses automatically, but that rule isn't perfect, and probably doesn't fit well with description language used for innerText algorithm.

My question then is, should we also drop the special handling of rp currently in the spec, which is implemented only in Gecko at the moment?

I'm fine with doing this if no one else opposes.

tkent-google commented 7 years ago

I agree with @upsuper about the last paragraph of https://github.com/whatwg/html/issues/1801#issuecomment-263584759. Introducing new behaivor which is not compatible with any existing implementation isn't welcome.

zcorpan commented 7 years ago

Thanks. I've withdrawn the proposal. I will make a new pull request to drop the special handling of rp.

zcorpan commented 7 years ago

https://github.com/whatwg/html/pull/2129 https://github.com/w3c/web-platform-tests/pull/4276

zcorpan commented 7 years ago

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1322096

rniwa commented 7 years ago

I think for CJK languages, majority of people would prefer either fullwidth parentheses or proportional parentheses with whitespace around. Maybe proportional parentheses are more desirable for other languages? Although it seems to me CJK languages (especially Japanese) are the main user of ruby.

That’s note quite true. People DO use proportional (half-width) parenthesis in Japanese without spaces. I’ve rarely seen anyone inserting spaces around parenthesis in Japanese for that matter.

You are correct that innerText was added mainly for better Web compat.

I'm happy to drop the proposal if people think it is not worth it. My question then is, should we also drop the special handling of rp currently in the spec, which is implemented only in Gecko at the moment?

Inserting parenthesis is quite important for copy & paste (otherwise important content can be lost during copy). WebKit uses the same algorithm for both innerText and coy & paste so this is quite important for us.

kojiishi commented 7 years ago

I think for CJK languages, majority of people would prefer either fullwidth parentheses or proportional parentheses with whitespace around. Maybe proportional parentheses are more desirable for other languages? Although it seems to me CJK languages (especially Japanese) are the main user of ruby.

That’s note quite true. People DO use proportional (half-width) parenthesis in Japanese without spaces. I’ve rarely seen anyone inserting spaces around parenthesis in Japanese for that matter.

I agree, but since we have to choose one, I think you'll find "typically" if you look at referring bugs and discussion at I18N WG, and I agree with I18N that if we pick typically used one, that'd be fullwidth.

The larger issue than width is the baseline. ASCII parentheses are usually designed to match to x-height, which is too low to use for CJK, while fullwidth parentheses are designed to match to em-height. There are a few fonts that has em-height parentheses for ASCII parentheses but they're really a few, I know only 3, because doing so sacrifices English rendering.

In today's fonts environment, if we want parentheses that matches to CJK without extra spacing, we need to use fullwidth code points with pwid OpenType feature.

Inserting parenthesis is quite important for copy & paste...

I'll leave @tkent-google on whether we want to do this or not.

rniwa commented 7 years ago

In today's fonts environment, if we want parentheses that matches to CJK without extra spacing, we need to use fullwidth code points with paid OpenType feature.

The problem here is that this would mean that the lack of rp would now result in a full-width parenthesis being inserted even in English and Latin text, which is highly undesirable. Using half width parenthesis, on the other hand would still work for CJK even if it weren't ideal. We might need to resolve the current language from the nearest ancestor and decide whether to use full width or not.

kojiishi commented 7 years ago

I know some people are taking about Ruby's useful for Latin and other languages, but have never seen single page using it. Have you?

Either way, it looks like Gecko and Blink doesn't want this. Maybe we should try to reach consensus on it first. Well, it was probably me who added the noise, sorry about that.

kojiishi commented 7 years ago

Found the comment from @r12a.