Closed jonaslil closed 8 months ago
I'm asking for some input to get started with this issue. How detailed is the language markup in your books at present? Do you tag the language or parts at all or merely the language of the file? Is the tagging done entirely by the suppliers? I think we don't have much tagging of the language of parts, unless it is requested in editing instructions for a specific book.
Would it be reasonable to suggest that any language change that affects a paragraph or more is tagged, but more detailed markup is added only upon request?
Adding @oscarlcarlsson to this discussion as well.
We add it on block-level if needed, never inline elements. So, I would do as you suggest here.
Actually, I thought block-level language markup was strictly mandated by 2020-1, but apparently not. Yes, I believe that the general rule should be block-level language tagging, that is the minimum. It was explicit in 2015-1.
I guess this is one case where our guidelines do not automatically satisfy WCAG compliance, as I think the AA requirement you linked to include a more granular level of tagging, right @jonaslil?
Cf #46
Thank's for the input! So we need to include the requirements that were explicit in 2015-1 into the new guidelines somehow.
As to the WCAG compliance question:
The documentation of this criterion mentions some exceptions: "proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text", but it does not limit the granularity per se. The implication seems to be that even single words should be tagged, if it is clear that a language change was intended. (Understanding Language of Parts)
But I'm not so shure that we need to point out a possible WCAG non-conformance here. In this case, we conform to the extent we deem practically possible. The WCAG requirement just doesn't fit publishing outside of English-speaking monolingualism very well - unless technology evolves and the markup can be automated. In the alt text case, the placerholder texts are obviously inadequate.
Thank you very much for elaborating on the compliance, @jonaslil. In addition to the language context that you mention, there is also the practical difference in applying granular language markup in original work publishing and the work we do with retrofitting/digitising already published works. Not sure about my thoughts on skipping a WCAG warning here, but perhaps the warning/note could be softer than that of the alt text issue?
Here's a suggestion for how this could be handled in the guidelines. I'll leave the WCAG warning out for now.
Leave the section 2.5.1.5 Language definition roughly as it is. Definition of the main language of the document is treated here in the context of requirements for the xhtml documents: root element attributes etc.
Add a new subsection in 3.4 Mark-up Requirements. Possibly at the very end. Suggested header: "Language Tagging". Add cross-references between this section and the one about the language of the document.
The text of this new section could be something like this:
Content documents may contain text in other languages than the main language, as defined in the root element (see section ...). For longer passages comprising one or more block elements, the language must be specified using the lang and xml:lang attributes. Elements that may require these attributes include p, ol, ul, blockquote, aside and section. Inline text is not marked up unless specifically indicated by the Ordering Agency.
Excellent suggestion, @jonaslil, let's go for it! And yes, let's get back to the WCAG warning issue later on.
Do we want a limited list of language tags? I think i have run into issues where odd lang-attrubutes have affected our productions.
About limiting the language tags: There is a reference in the guidelines to the IANA registry of valid language codes. Can tags included there cause problems? I'm not shure how to come up with a list of safe tags. We may just have to handle the odd tags case by case in production.
I think a more narrow language validation could be a good practice, but it should be a local decision, so I would say it's probably outside the scope of the general guidelines.
The guidelines require suppliers to identify the language of each content file, but don't say anythting about mark up for the language of parts when the language changes within a file. This is a WCAG requirement. At least some TTS voices can change their pronunciation based on language mark up, and it helps screenreader users. Should we include instructions about this in the guidelines? Defining a suitable level of granularity applicable to all cases may be a challenge.