Open GLRoylance opened 6 years ago
I suspect part of the vagueness is because the CSS pseudoclass is designed to work with many different document types, which may have their own syntaxes for specifying the element language.
CSS uses BCP 47 in the :lang()
selector. If a document type uses a different syntax, the user agent needs to convert it to BCP 47 in order to test equality.
The allowed values for the lang
attribute in HTML and the xml:lang
attribute in XML are defined in those specifications. HTML specifies BCP 47, XML references rfc 3066. I'm not an expert on the differences between those two, but I'm pretty sure "en_US" isn't valid for either.
BCP 47 is the concatenation of RFC 5646 and RFC 5647.
RFC 5646 supersedes RFC 4646 which supersedes RFC 3066.
On Thu, Aug 16, 2018 at 1:35 PM, Amelia Bellamy-Royds < notifications@github.com> wrote:
I suspect part of the vagueness is because the CSS pseudoclass is designed to work with many different document types, which may have their own syntaxes for specifying the element language.
CSS uses BCP 47 in the :lang() selector. If a document type uses a different syntax, the user agent needs to convert it to BCP 47 in order to test equality.
The allowed values for the lang attribute in HTML https://html.spec.whatwg.org/multipage/dom.html#the-lang-and-xml:lang-attributes and the xml:lang attribute in XML https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-lang-tag are defined in those specifications. HTML specifies BCP 47, XML references rfc 3066. I'm not an expert on the differences between those two, but I'm pretty sure "en_US" isn't valid for either.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/w3c/csswg-drafts/issues/3022#issuecomment-413676381, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah_aIDdj9p03obydXLDq39ufcdCL3Vxtks5uRdekgaJpZM4WAJ2q .
By the way: Selectors Level 4, which is the version that is being actively edited, already has additional clarifying notes on this point. In particular:
Note: The content language of an element is defined by the document language. For example, in HTML, the content language is determined by a combination of the
lang
attribute, information from meta elements, and possibly also the protocol (e.g. from HTTP headers). XML languages can use thexml:lang
attribute to indicate language information for an element.
HTML specifies BCP 47, XML references rfc 3066.
Almost.
The values of the attribute are language identifiers as defined by [IETF RFC 3066], Tags for the Identification of Languages, or its successor
(my italics). So XML in practice uses BCP 47, same as HTML.
@AmeliaBR is exactly right. Selectors can be used with markup languages other than HTML, and not all of them will use BCP47 syntax to represent the content language, so Selectors requires the UA to convert to BCP47 syntax before making the comparison. For example, DocBook 3.1 accepts en_US https://tdg.docbook.org/tdg/3.1/refelem.html#DBRE.X.COMMON on its lang attribute.
I've tweaked the wording a bit to not imply that we're normalizing arbitrary strings to BCP47. https://github.com/w3c/csswg-drafts/commit/2df8680b5aa0be3ba3dca0ae512c62aad7a39c8e https://drafts.csswg.org/selectors-4/#the-lang-pseudo Let me know if this is acceptable @GLRoylance
https://drafts.csswg.org/selectors-3/#lang-pseudo
The draft says
The "normalized to BCP 47 syntax if necessary" opens a can of worms. It implies that the user agent should take locale strings such as "en_US" or "it_IT.utf8" and normalize them to BCP 47's syntax if necessary (which would be "en-US" and "it-IT"). Please do not suggest that an element's language can be set with xml:lang="en_US" or lang="it_IT.utf8" and the user agent will "normalize" it to a BCP 47 language tag.