w3c / jlreq

Text Layout Requirements for Japanese
https://w3c.github.io/jlreq/
Other
101 stars 17 forks source link

In-page search fails on ruby-annotated text #255

Open r12a opened 3 years ago

r12a commented 3 years ago

This issue is applicable to all languages that use ruby markup.

Inline annotations (often referred to as ruby) are commonly used for Japanese, Chinese, Korean, and Mongolian texts to provide information about pronunciation and sometimes meaning. (See What is Ruby?) Users searching for a phrase within a web page should be able to find phrases that correspond to both base text and annotations.

The GAP

If text is marked up for ruby using the interleaved markup approach, currently required by the HTML spec, a browser's in-page search no longer recognises the text. For example, if you search for 東京 (Tokyo) on a page that has this markup:

<ruby><rb>東<rt>とう<rb>京<rt>きょう</ruby>

the search will fail to locate the word.

Note that a tabular arrangement of markup, such as

<ruby><rb>東<rb>京<rt>とう<rt>きょう</ruby>

would work fine but, although it is parsed correctly, this tabular markup is currently not displayed correctly by Blink or Webkit, and therefore the HTML specification has obsoleted the rb and rtc elements.

The HTML specification currently blocks the workaround (which is to use tabular markup), but a solution needs to also be found for the interleaved markup.

Blink and Webkit browsers all fail to recognise strings when they have ruby text. Gecko appears to recognise the search string in the ruby element, but doesn't display the highlight correctly (see https://github.com/w3c/jlreq/issues/255#issuecomment-1278066359). (If you copy the text, you also get the ruby text, but not the one at the end of the ruby element.)

Gecko, Blink, and Webkit browsers all recognise strings when they have ruby text arranged in tabular format, however support for that arrangement was removed from the HTML spec because currently only Gecko and Amazon have browsers that display the markup correctly.

Priority

This is an obstacle to basic interaction with web pages, especially while the HTML spec blocks the use of tabular markup.

Tests

Interactive test, In page search will find text that has ruby annotations in interleaved markup

Interactive test, In page search will find text that has ruby annotations in tabular markup

Action taken

Browser bug reports: GeckoBlinkWebkit

Work is in preparation in the W3C HTML Working Group to spec out the full ruby model in a way that can be incorporated into the WhatWG spec in future.

Outcomes

tbd

r12a commented 3 years ago

The first comment in this issue contains text that will automatically appear in several gap-analysis documents in the Inline notes & annotations section, as a topic with the same title as this issue. Any edits made to that comment will be immediately available in the Editor's draft of the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include: _ChineseJapaneseKoreanMongolian_

xfq commented 3 years ago

Browser bugs raised: Chromium WebKit Mozilla

xfq commented 3 years ago

Do we want to file this issue to other gap analysis documents as well?

xfq commented 3 years ago

Should we mention this in https://w3c.github.io/string-search/#searching ?

r12a commented 3 years ago

thanks, yes (done), yes probably

gnprice commented 1 year ago

Gecko appears to have fixed the issue.

There is a residual bug which @xfq pointed out on that issue thread: when the search result is highlighted, the highlight applies to some but not all of the ruby text: image

I suspect that what is happening is that when the matching text crosses multiple rb nodes, the highlight includes the rt nodes that appear between them. In the example with <ruby><rb>東</rb><rt>とう</rt><rb>京</rb><rt>きょう</rt></ruby> and a search for "東京", this would explain why "とう" is highlighted and "きょう" is not.

The residual issue seems lower-priority than the original one. Perhaps it should be tracked as its own issue.

himorin commented 1 year ago

@xfq remove gecko label? (might need another gap issue for in-page search on ruby annotation???)

r12a commented 1 year ago

I edited the main comment to reflect this information. Thanks.