w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
304 stars 60 forks source link

(i18n) What should the RS do if a language value is not well formed? #1508

Closed iherman closed 3 years ago

iherman commented 3 years ago

At the moment, the spec is silent on this...

mattgarrish commented 3 years ago

We don't really have much in the way of requirements for the language, though. Beyond potentially trying to guess at the page progression direction, it's only advisory metadata.

We could always suggest assuming "und". Anything else seems complex and unreliable, like checking content documents for language tags.

iherman commented 3 years ago

Specifying 'und' as a value for not well-formed entries may be a good approach. At this moment, there is not even a hint to RS-s that they should check the value.

dauwhe commented 3 years ago

Useful comment: https://github.com/w3c/bp-i18n-specdev/issues/36#issuecomment-531072094

iherman commented 3 years ago

The issue was discussed in a meeting on 2021-02-18

List of resolutions:

View the transcript ### 2. Remaining i18n issues _See github issue [#1508](https://github.com/w3c/epub-specs/issues/1508), [#1509](https://github.com/w3c/epub-specs/issues/1509)._ **Wendy Reid:** 2 i18n issues remain after the review **Dave Cramer:** the two issues are pretty intertwined … i18n should require valid lang tags … there is a formal grammar which describes the formal structure of language tags … so, well-formedness > *Leonard Rosenthol:* I believe that lang is ISO 3166 **Matt Garrish:** we enforce well formed, but nothing about validity **Dave Cramer:** a valid tag is one which matches the actual languages … so should we require lang tags to be valid? … and what should RS do when faced with invalid lang tag? … not want to make requirement more stringent … hard to check validity of lang tags … its a list of strings that changes over time … burden on epubcheck … also, there could be existing epub with well formed but invalid lang tags - a change could cause those epubs to fail epubcheck … should having a valid lang tag just be a best practice? … so epubcheck would just flag it as an informative warning > *Wendy Reid:* leonardr: **Leonard Rosenthol:** [here's the info](https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang … BCP-47 is the technical spec, you can validate against that if you want … but i think we should do what HTML does, no more, no ness **Matt Garrish:** some of this came up in epubcheck itself, when someone complained that there was no check for lang validity … also, the epubcheck folks seemed to say that it would be a hard thing to implement … and unless there is some critical function, a RS is going to ignore this … and there are currently no critical functions that rely on this very general metadata … nothing bad comes of this lang tag not being specified properly … in pub manifest we said that well-formedness is good enough **Wendy Reid:** also, we'd run into issues with testing this if we wanted to make this stricter **Dave Cramer:** matt has more or less convinced me that this isn't broken > *Brady Duga:* +1 to not broken **Dave Cramer:** ... the costs of fixing it are higher than the purely theoretical benefits of conforming to broadly worded i18n guidelines **Dave Cramer:** i propose that we close this without fixing **Matt Garrish:** something like WCAG could have stricter rules about lang tags, but for us its not a critical piece of metadata **Ben Schroeter:** i like the idea of doing some sort of warning in epubcheck … also, i don't want to tell RS what to do in general … RS want to be as lax as possible when it comes to what they will ingest … they don't want to keep content authors off their RS > **Proposed resolution: Close issue 1509 with no action** *(Wendy Reid)* > *Marisa DeMeglio:* +1 > *Ben Schroeter:* +1 > *Matthew Chan:* +1 > *Wendy Reid:* +1 > *Brady Duga:* +1 > *Matt Garrish:* +1 > *Toshiaki Koike:* +1 > *Masakazu Kitahara:* +1 > *Shinya Takami (高見真也):* +1 > ***Resolution #2: Close issue 1509 with no action*** **Dave Cramer:** for 1508 (i.e. question of what RS should do with poorly-formed lang tag) **Dave Cramer:** there was a suggestion that the RS treat the lang as "und" (i.e. undefined) … that to me is a satisfactory solution to this somewhat theoretical problem > *Leonard Rosenthol:* +1 > **Proposed resolution: Close issue 1508, add text to RS specification instructing reading systems to treat a poorly-formed language tag as "und" (undefined)** *(Wendy Reid)* **Brady Duga:** is this yet another untestable assertion? … should we tell RS what to do with this at all? **Matt Garrish:** i suggested the "und" thing because i thought we'd done this in pub manifest as well … but i think we actually went back and decided to remain silent on it … "we're not going to define what it means for the RS" **Dave Cramer:** testing it would require reading the minds of the RS … what we're actually using the lang tag for is trying to guess at page progression direction? … how would we know if the RS is actually doing this? … so maybe another "close, won't fix"? **Ben Schroeter:** if we feel the RS wants some sort of guidance we could change the proposal to say "suggest" … but i'm also happy to drop it **Matt Garrish:** i think there's something worrisome about RS determining lang for the author … i'd rather RS do nothing **Dave Cramer:** would also add that RS don't seem to be looking for guidance > **Proposed resolution: close issue 1508, won't fix** *(Wendy Reid)* > *Ben Schroeter:* +1 > *Brady Duga:* +1 > *Wendy Reid:* +1 > *Matthew Chan:* +1 > *Matt Garrish:* +1 > *Toshiaki Koike:* +1 > *Masakazu Kitahara:* +1 > ***Resolution #3: close issue 1508, won't fix***