maranget / hevea

Hevea is a fast latex to html translator
http://hevea.inria.fr
Other
97 stars 12 forks source link

Set `html` element's attribute `lang` according to document's language #26

Closed cspiel closed 4 years ago

cspiel commented 4 years ago

This P/R addresses issue #25. Its primary target is to provide transparent setting of the lang attribute for the whole HTML document based on the LaTeX

It has turned out that the central function, a map from babel language names (e.g. swissgerman) to language codes (e.g. de-CH) is also suitable to implement some of the missing babel environments and macros e.g. \foreignlanguage{LANG}{TEXT}. Their implementation is trivial. The documentation has been extended and updated accordingly.

Known Problem: Babel allows for a quirky way to set the document's main language

\usepackage[main=italian,french]{babel}

which makes italian the main language. This P/R partially implements parsing the main modifier, but ''babel.hva'' ignores it (due to standard parsing of the package's options) and thus the document's main language is wrongly detected when main= is used.

maranget commented 4 years ago

Hi, I am a bit concerned about default behaviour. At the moment if no language is specified, the PR emits a lang="en" attribute to the <html .. > element. Wouln't the previous behaviour of not emitting any lang attribute be more logical ?

I'd also like to have a look at the main=... question, as I do not understand exactly why babel.hva cannot parse the main=italian optional package argument.

maranget commented 4 years ago

I'd also like to have a look at the main=... question, as I do not understand exactly why babel.hva cannot parse the main=italian optional package argument.

My commit to your PR attempts precisely to do this. I had to modify package.hva a bit. Would you please have a look on it ?

maranget commented 4 years ago

Code looks good and passed all my tests. Now main= works as expected.

I am glad it does. What about my other comment on not having a lang attribute when no language is specified?

cspiel commented 4 years ago

I know you are afraid of breaking backward compatibility with the introduction of lang="en" as default.

  1. A document that does not mention any language, e.g. via babel is treated as (US-)English by LaTeX/TeX -- in particular when it comes to hyphenation. So, it is transparent to the user that her Hevea-translated HTML-version of the document gets lang="en". The document's language should be a document property and not a (configurable?!) browser property.
  2. W3 kind of cheers for this attribute, but it really plays out with translation/interoperability and there with screen readers.

So, I'm inclined to have a default lang-attribute, though I'm not sure which to prefer: en or en-US.

maranget commented 4 years ago

Hi,

Ok I see. Although I'd rather had have language unspecified for pages in English, that pages in French with lang="en" specification, I'll follow your suggestion and merge the PR.

Thanks a lot for your contribution.