microformats / microformats2-parsing

For collecting and handling issues with the microformats2 parsing specification: http://microformats.org/wiki/microformats2-parsing
14 stars 6 forks source link

Parse document language from <html lang=""> attribute #72

Open barnabywalters opened 11 months ago

barnabywalters commented 11 months ago

The mf2 parsing spec should consider looking for a lang attribute on the <html> element, and making its value available in the "lang" key on the parsed results, e.g.

<html lang="en-us">
</html>

would result in

{
  "items": [],
  "lang": "en-us"
}

This change would make the page-wide language easily available to the consumer without them having to do additional parsing of their own, and would reduce the need to implement language inheritance mentioned in https://github.com/microformats/microformats2-parsing/issues/3

Parsing should be restricted to the <html> element rather than also looking on <body> or taking the first lang attribute found, as <html> is the preferred location for specifying the page-wide language.

barnabywalters commented 11 months ago

Via jkingweb: we could consider implementing the language algorithm in the HTML spec, which would involve additionally looking for a <meta> element with a language value, if none is found in <html> https://html.spec.whatwg.org/multipage/dom.html#the-lang-and-xml:lang-attributes