nlbdev / nordic-accessible-epub-guidelines

1 stars 2 forks source link

Language of parts #56

Closed jonaslil closed 7 months ago

jonaslil commented 1 year ago

The guidelines require suppliers to identify the language of each content file, but don't say anythting about mark up for the language of parts when the language changes within a file. This is a WCAG requirement. At least some TTS voices can change their pronunciation based on language mark up, and it helps screenreader users. Should we include instructions about this in the guidelines? Defining a suitable level of granularity applicable to all cases may be a challenge.

jonaslil commented 8 months ago

I'm asking for some input to get started with this issue. How detailed is the language markup in your books at present? Do you tag the language or parts at all or merely the language of the file? Is the tagging done entirely by the suppliers? I think we don't have much tagging of the language of parts, unless it is requested in editing instructions for a specific book.

Would it be reasonable to suggest that any language change that affects a paragraph or more is tagged, but more detailed markup is added only upon request?

Adding @oscarlcarlsson to this discussion as well.

oscarlcarlsson commented 8 months ago

We add it on block-level if needed, never inline elements. So, I would do as you suggest here.

martinpub commented 8 months ago

Actually, I thought block-level language markup was strictly mandated by 2020-1, but apparently not. Yes, I believe that the general rule should be block-level language tagging, that is the minimum. It was explicit in 2015-1.

I guess this is one case where our guidelines do not automatically satisfy WCAG compliance, as I think the AA requirement you linked to include a more granular level of tagging, right @jonaslil?

Cf #46

jonaslil commented 8 months ago

Thank's for the input! So we need to include the requirements that were explicit in 2015-1 into the new guidelines somehow.

As to the WCAG compliance question:

The documentation of this criterion mentions some exceptions: "proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text", but it does not limit the granularity per se. The implication seems to be that even single words should be tagged, if it is clear that a language change was intended. (Understanding Language of Parts)

But I'm not so shure that we need to point out a possible WCAG non-conformance here. In this case, we conform to the extent we deem practically possible. The WCAG requirement just doesn't fit publishing outside of English-speaking monolingualism very well - unless technology evolves and the markup can be automated. In the alt text case, the placerholder texts are obviously inadequate.

martinpub commented 8 months ago

Thank you very much for elaborating on the compliance, @jonaslil. In addition to the language context that you mention, there is also the practical difference in applying granular language markup in original work publishing and the work we do with retrofitting/digitising already published works. Not sure about my thoughts on skipping a WCAG warning here, but perhaps the warning/note could be softer than that of the alt text issue?

jonaslil commented 8 months ago

Here's a suggestion for how this could be handled in the guidelines. I'll leave the WCAG warning out for now.

Leave the section 2.5.1.5 Language definition roughly as it is. Definition of the main language of the document is treated here in the context of requirements for the xhtml documents: root element attributes etc.

Add a new subsection in 3.4 Mark-up Requirements. Possibly at the very end. Suggested header: "Language Tagging". Add cross-references between this section and the one about the language of the document.

The text of this new section could be something like this:

Content documents may contain text in other languages than the main language, as defined in the root element (see section ...). For longer passages comprising one or more block elements, the language must be specified using the lang and xml:lang attributes. Elements that may require these attributes include p, ol, ul, blockquote, aside and section. Inline text is not marked up unless specifically indicated by the Ordering Agency.

martinpub commented 8 months ago

Excellent suggestion, @jonaslil, let's go for it! And yes, let's get back to the WCAG warning issue later on.

oscarlcarlsson commented 7 months ago

Do we want a limited list of language tags? I think i have run into issues where odd lang-attrubutes have affected our productions.

jonaslil commented 7 months ago

About limiting the language tags: There is a reference in the guidelines to the IANA registry of valid language codes. Can tags included there cause problems? I'm not shure how to come up with a list of safe tags. We may just have to handle the odd tags case by case in production.

martinpub commented 7 months ago

I think a more narrow language validation could be a good practice, but it should be a local decision, so I would say it's probably outside the scope of the general guidelines.