w3c / wcag

Web Content Accessibility Guidelines
https://w3c.github.io/wcag/guidelines/22/
Other
1.13k stars 257 forks source link

3.1.2 Language of parts - can single words without lang mark-up be considered a PASS? #1174

Open detlevhfischer opened 4 years ago

detlevhfischer commented 4 years ago

I apologise for returning to an issue that has already been discussed a few times here: https://github.com/w3c/wcag/issues/297 and https://github.com/w3c/wcag/pull/808 with no clear decision yet taken due to more urgent business. Considering the loads of other stuff, I mark this as discussion, not something requiring a working group vote.

Our test procedure for 3.1.2 Language of Parts currently requires single words that are neither spoken in a similar way as in the source language nor have become part of the vernacular to be marked up.

We got a change request calling out a mistake here, arguing that single words do not fall under 'passage or phrase'. The user argues that marking up single words forces screen readers to switch language, and that while some desktop screenreaders can turn off language change, this this not possible in VoiceOver and Talkback, making pages where single words are diligently marked up unpleasant to read for screen reader users.

I would like to get a quick straw poll of other people's opinion on whether you would consider the lack of markup on single words necessarily a FAIL. Would you accept the trade-off argument that the practice is so disruptive in many contexts (especially mobile) that it can be considered a PASS when individual words are not marked up (while longer passages like quotations, book citations of titles etc. would of course still require lang markup)? An exception where single word mark-up seems important appears to be language choice menus. Where do you stand? (A), (B) or (C)?

(A) When single words in a foreign language are not marked up, this is acceptable since on balance, it makes texts easier to read for screen reader users (who are often familiar with the pronounciation of these words without markup)

(B) Words without lang markup that are pronounced differently and are not part of the vernacular must be marked up - if they are not, this is a clear FAIL.

(C) It depends on the context and the specifc word whether or not lack of markup is acceptable, And we should create tickets for VoiceOver and TalkBack.

KerstinProbiesch commented 4 years ago

Hi all,

I just want to add Issue 287.

Cheers

Kerstin

patrickhlauke commented 4 years ago

I'd tend towards B (as C is difficult to formalise...how do you define "it depends (on context)" in a sufficiently prescriptive unambiguous way in an SC) if we're being strict/correct. But I'd class it as a very low priority/severity failure. The fact that some AT don't handle it well is...a problem with the AT?

With all that said, it can be one of those situations where you as a developer consciously accept a normative FAIL on the grounds that usability trumps the formalism required by the SC.

KerstinProbiesch commented 4 years ago

Hi all,

I’m a bit confused cause I opened in 2017 this Issue (Issue 287) for seeking clarification on what the SC says? In this SC we have exceptions and “passage or phrase”. Single words are no passages. So my question was, what is meant by “phrase” – cause there is no definition in the normative glossary. The answer was:

WCAG 2.0 is definitely referring to the "group of words" definition.

Acccording to this answer it is a pass when single words from a foreign language don't have markup.

Screenreader:

How AT are dealing with words from a foreign language does not just depend on the screenreader but (also) on the voice and in some cases also on the OS. Some years ago I’ve tested some (with some help for vocalizer from a user) with a short german text: 101 words, 18 english (Disclaimer: The text includes also proper names which is according to the SC an exception. And four words that are not pronounced differently, which is not described in the SC):

This was the german text:

Kurzinformation Social Media Social Media fördern im Web die soziale Interaktion zwischen Usern. Ein wesentliches Merkmal von Social Media ist das schnelle Verbreiten von Know how, Meinungen und Informationen. Auch kollaboratives Schreiben (z.B. Wikipedia) fällt unter das Dach von Social Media. Neben Blogs und Diensten für Social Bookmarks (öffentliches Verwalten und Teilen von Lesezeichen) werden auch Podcasts und Portale für Media-Sharing (Teilen von Videos, Bildern usw.) dazu gerechnet. Ebenfalls zu nennen sind Social Networks wie Xing und Linkedin sowie Facebook. Die bekanntesten Dienste aus dem Bereich Social Media sind wohl Twitter und Facebook sowie YouTube und Instagram und natürlich die Wikipedia.

Results with no markup:

Results with markup:

Not only those breaks are described often by screen reader users as disturbing when happened also for single words. Depending on the screen reader / the voice there may be also switches between masculine and feminine voices by default (for example in iOS) - depending on the language. One can change that in the settings but even after changing there are recognizable issues which are decribed by screen reader user as disturbing when happening for even single words.

Cheers

Kerstin

awkawk commented 4 years ago

WCAG 2.0 is definitely referring to the "group of words" definition.

@kerstinp This statement referred to the text you had provided above:

Wikipedia for example says: "(...) In linguistic analysis, a phrase is a group of words (or possibly a single word) that functions as a constituent in the syntax of a sentence, a single unit within a grammatical hierarchy. A phrase typically appears within a clause, but it is possible also for a phrase to be a clause or to contain a clause within it."

In essence, agreeing that 3.1.2 could be on a single word.

KerstinProbiesch commented 4 years ago

@awkawk

I would also agree that the markup could be on a single word. The more important question - and this was the background of my issue with the reference to a definition - was what is sufficient for 3.1.2. When 3.1.2 is referring to the "group of words" definition for phrase - as said in this issue, it is sufficient when single words don't have markup and not a failure when single words don't have markup.

MakotoUeki commented 4 years ago

I think it depends on the word. If it needs to be pronounced correctly to make sense, the word needs to be marked up by using the lang attribute so that screen readers can pronounce it correctly.

For instance, if the word "OK" which is not a Japanese word is used on a Japanese web page, I don't think it needs the lang attribute (lang="en") because Japanese people are familiar with "OK" and Japanese screen readers don't have any pronunciation issue for "OK". So it will "PASS" without the lang="en". It will also true for other English words such as "Open", "Good", "Cancel", "Go", etc.

On the other hand, if a word in foreign language is used and it needs to be pronounced correctly, it must be marked up to specify the language.

For instance, there is a link text written in Korean and it will lead to the Korean version of the website, The link text "Korean" (written in Korean) needs to have the lang="ko" so that Korean screen reader can read it aloud in Korean and Korean screen reader users will be able to understand the link text.

In this case, we should note that it will make sense only if the screen reader support the lang attribute. This is an "Accessibility support" issue.

awkawk commented 4 years ago

@kerstinp OK, good. I believe that single words need to be marked up with the correct language, with some exceptions that the SC and the understanding document identify. Makoto has a good example of "OK" not triggering a failure in Japanese text. Certainly, single words like the list of languages available for a web site will need to be marked up properly.

I would say that my answer is "B" except for the pronounced differently aspect. Of course the word will be pronounced differently with the correct speech synthesizer, but If I use "faux pas" in a sentence it is recognizable within regular "English" usage even though it is French and even though it will be pronounced differently by the english synthesizer. Also, the full exception is "except for proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text" which is more encompassing than what B indicates, so I think that my actual answer is "C".

JAWS-test commented 4 years ago

I don't think it makes sense to make the markup with the lang attribute dependent on whether certain screen readers of certain words randomly output correctly or not. This can change at any time and is not an objective test criterion

KerstinProbiesch commented 4 years ago

@awkawk

Yes. The full exception is "except for proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text" but the SC starts with "The human language of each passage or phrase in the content" and not with "The human language of each word, passage or phrase in the content

awkawk commented 4 years ago

@kerstinp yes, but this is what we figured out in issue 287 - that you can have a phrase of one word.

KerstinProbiesch commented 4 years ago

@awkawk

I think that the WG should provide an official response. Cause it is an important issue. I don't know of course all screen reader users, but many. And not one german screen reader user whom I know is not disturbed when single words have markup.

If we figured out in issue 287 I don't understand why the response was "WCAG 2.0 is definitely referring to the "group of words" definition."

yatil commented 4 years ago

In practice, I would not consider individual words a fail, unless their language is essential to understand the word in the context. This can happen if a secondary language word is used but is written the same way as a word in the host language. I think those cases are rare.

I think it is best to leave that to the language processors in Screen Readers to figure out. Now, if you have a longer phrase (“Bonjour mon ami”), I would require a language change.

If WCAG meant “word” in the SC, the SC should say ”word”.

I also find this section of the Understanding document pretty clear:

Individual words or phrases in one language can become part of another language. For example, "rendezvous" is a French word that has been adopted in English, appears in English dictionaries, and is properly pronounced by English screen readers. Hence a passage of English text may contain the word "rendezvous" without specifying that its human language is French and still satisfy this Success Criterion. Frequently, when the human language of text appears to be changing for a single word, that word has become part of the language of the surrounding text. Because this is so common in some languages, single words should be considered part of the language of the surrounding text unless it is clear that a change in language was intended. If there is doubt whether a change in language is intended, consider whether the word would be pronounced the same (except for accent or intonation) in the language of the immediately surrounding text.

(Source, emphasis mine.)

patrickhlauke commented 4 years ago

I think at the core the problem here is, again, a lack of nuance in the very binary "pass/fail" nature of SCs, and the problem of trying to normatively define something unambiguously while also potentially allowing for some interpretation/common sense judgement.

perhaps the problem is more acute for languages that borrow a lot of words from, say, english (though there, the argument is strong that they're loan words that have become part of the vernacular). i gather (from some information that kerstin provided previously, which I hope she won't mind me mentioning) that in some cases the pass/fail in this case gets predicated purely on what the national dictionary authority in that particular country "deems" to be a loan word or not (rather than common sense/the auditor's own judgement). that, of course, is a ridiculous situation...

conversely though, there are situations where it may be much clearer that even for a single word, it is necessary to actually mark up the language change...for instance, when it comes to languages that are so far removed from the main language of the document and cannot be considered a loan word. if i just casually dropped words like Bundeskanzleramt it would be difficult to argue that it's fine as is. or even languages that use different alphabets/phonetics altogether, like if i dropped in 山 here when talking about a japanese mountain.

so while inconclusive, i'd say that normatively yes, even single words fall under this SC. but that the aspect of loan words/vernacular/common use need to be strongly emphasised in at least the understanding, and that situations like what seems to be happening in some places where the national dictionary is seen as the only way to decide what is an isn't a loan word should be strongly discouraged.

and worst comes to the worst, clients/developers may well, as said, accept that they may get a fail, but back it up with their reasoned argumentation about why they decided to go that way. (which of course goes against the grain of "being fully compliant", but because WCAG lacks the concept of a "soft fail", it is what it is)

awkawk commented 4 years ago

@yatil I agree that the Understanding text makes it clear, but come to a different conclusion.

Because this is so common in some languages, single words should be considered part of the language of the surrounding text unless it is clear that a change in language was intended. If there is doubt whether a change in language is intended, consider whether the word would be pronounced the same (except for accent or intonation) in the language of the immediately surrounding text.

The SC says that words are an exception when they are part of the vernacular of the surrounding text. If they are an exception, then they are included in the set of things that can be excepted from, otherwise why even mention them? In the understanding document it essentially says this in that by saying that single words should be considered part of the language unless it is clear that the language change was intended - so there are words that do need to have the language marked for.

yatil commented 4 years ago

…which is my conclusion. Sorry for not being clear about it.

KerstinProbiesch commented 4 years ago

It is for sure not suprising that I agree with

If WCAG meant “word” in the SC, the SC should say ”word”.

and this is not the case in the SC.

But I want to say something to topic B above: "not part of the vernacular"

It may be that there is something like a vernacular in other languages. We don't have a standardized german / vernacular and no official institutions for that. Of course we have dictionaries but dictionaries are publications from publishing houses. As far as I know there are in minimum five. We have a "Sprachkommission" (language commission), but the role of this commission is not "language regulator", cause it is not mandatory.

We have in minimum five - probably more - dictionaries from different publishing houses. And from time to time new editions. Which dictionary from which publishing house should be chosen for the markup of a single word? One? Why this one? All? And because of new editions the following can happen: Over night a single word from a foreign language is in a new edition or it was in the former edition but was deleted for the new edition. So: Over night a single word from a foreign language may - if the non existent official vernacular german is the testing procedure - could be a pass without any markup or over night a fail because it is no more in the new edition.

awkawk commented 4 years ago

@kerstinp It sounds like you are trying to make this a success criterion that is machine-testable and there is a definite right or wrong answer. I think that this is more like text equivalents - there will be many cases that everyone agrees on, but others that have greater subjectivity.

KerstinProbiesch commented 4 years ago

@awkawk No. I don't have machine-testable in mind. I just want guidance for clients on what is to do, what is a failure and what is a pass. And I of course would like to have decision on what is covered by this SC and what not and what is sufficient. This topic might not be very important in other language. But for german texts it is. And also relevant for screen reader users are the breaks, changings of intonations or even changing from masculine into feminine and back - and this probably several times in one paragraph when single words would really need markup. And as mentioned above: I don't know of course every screen reader user but all screen reader user I spoke with are not in favour of markup for single words. Some even say that depending on the word and the language they would understand some words better without markup. And would you not agree that our goal should be supporting user and not disturbing them?

And one can not rely in testing procedure "part of vernacular" for languages where no standardized vernacular exists.

Why not a decision like this:

Level AA - ist is sufficient when group of words have markup. It is not a fail when single words don't have markup. It is a pass when single words don't have markup and a pass when they have markup.

New SC on Level AAA - also single words need markup.

According to EN 301 549 WCAG AA is the minimum but the countries of the European Union can decide more. In those countries where the community says single words should have markup they could decide for this and in those countries where the community says that they don't want markup for single words or saying it is senseless they don't decide for this.

And the minimum for all is of course markup for group of words with the exception already in the SC.

KerstinProbiesch commented 4 years ago

Just for correcting myself: An AAA with markup for a single word was short-sighted, cause AAA should lead to more accessibility and not to more what is described by many user here as disturbing or senseless. Nevertheless those countries in which the community says it would be positive can decide for this (even without an SC which covers markup for every single word)

patrickhlauke commented 4 years ago

We have a "Sprachkommission" (language commission), but the role of this commission is not "language regulator", cause it is not mandatory.

So: Over night a single word from a foreign language may - if the non existent official vernacular german is the testing procedure - could be a pass without any markup or over night a fail because it is no more in the new edition.

if there is no official vernacular, and no mandate/compulsion ... then it's just down to the understanding/judgement of the auditor/developer, no? which then will, yes, lead to uncertainty about what a pass/fail is between auditors, but we already have that for most other SCs that have an aspect of subjectivity to them (e.g. what's a "sufficiently descriptive" heading or label, etc)

(of course, if it ever came to a legal challenge, i have no doubt that those who insist that every single non-native word needs to be explicitly marked up will use the example of the dictionary as an "objective" measure, but then that's down to the judge/jury to decide. but of course, if there's a centralised testing entity in a country, and they "grant" audit results and mandate that everything must pass, and THEY take the hardline interpretation...of course then that becomes problematic as their subjective take becomes the hardline rule, beyond what the intention/ask of the actual WCAG SC is)

KerstinProbiesch commented 4 years ago

Therefore more uncertainty should be avoided.

patrickhlauke commented 4 years ago

i note with interest as well that the official german translation of WCAG translates "vernacular" as "Jargon". that, to me, is incorrect. "Jargon" ist the specific (often technical) language for a particular field. much closer to me would be the german "Umgangssprache" (common/everyday language). at least that's, i believe, the intention of the SC's wording.

patrickhlauke commented 4 years ago

making this a bit more actionable for now... would the rest of AGWG agree that the intent of the word "vernacular" in the original english version of WCAG refers to probably both "everyday language" and "jargon" (and that the german translation of WCAG is only partially correct there by just zeroing in on "Jargon")?

and, assuming that, would it be ok to add a note to the understanding document clarifying that, to an extent at least, it's subjective when a word is part of the "vernacular" of the surrounding language or not, and that auditors/developers need to use their common sense when deciding this, also noting that the intent of the SC is not to mechanically mark up each word that may be a loan-word from a different language for the sake of it, but rather to aid in understanding/reading of the text - which can mean that sometimes, even a single word or two may well remain non-marked-up if it's judged to be common enough?

yatil commented 4 years ago

Side comment:

Notice how the German translation uses the word “Jargon” not according to the glossary (and thus not linked) in 3.1.2 and according to the glossary (and thus linked) in 3.1.3. This might contribute to the confusion of Germans. (I think this should be added to the Errata for the translation.)

Screenshot of 3.1.2 and 3.1.3 in the German translation of WCAG, different use of “Jargon” highlighted with arrows

https://www.w3.org/Translations/WCAG20-de/#meaning-other-lang-id

patrickhlauke commented 4 years ago

yup, so to me, the english version uses "vernacular" to mean "everyday language", otherwise even the english version would use and link to its own definition of jargon as it does in 3.1.3. the german translation, while not linking the word in 3.1.2, reuses it which incorrectly implies that that's what 3.1.2 meant as well.

addendum: who should be contacted about the official translation? and to make sure that the (in progress?) translation of WCAG 2.1 into german, which doesn't seem to have been done yet, doesn't fall down the same hole as well?

KerstinProbiesch commented 4 years ago

Sorry that this is again very long.

I am very sure, very, that it was not the official translation which brought in confusion. The official translation of WCAG played over years no role in Germany and was widely ignored. This seems to be blast from the past but leads to the situation which is adressed by this issue:

For the german BITV 2.0 (regulation for accessibility) not the official translation of WCAG was used. And was especially in this case the german BITV not identical with the official german translation and not even with WCAG-SC 3.1.2.

In the old german BITV (until the new one was introduced last year) the exceptions were not included in the normative text. This was a decision of the working group for the "old" BITV, and is documented in the "Begründung" (I don't know the proper word in English: "Justification"?). German text:

"Die WCAG 2.0 lassen Ausnahmen zu wie Eigennamen, Fachbegriffe, sprachlich unbestimmte Begriffe sowie Begriffe oder Sätze, die Teil der im textlichen Umfeld verwendeten Umgangssprache sind. Die Anlage 1 übernimmt diese Ausnahmen grundsätzlich nicht, weil Eigennamen oder Fachbegriffe aus Sicht der Arbeitsgruppe ausgezeichnet beziehungsweise umbenannt werden müssen. Für Begriffe, die in den allgemeinen Sprachgebrauch übergegangen und insbesondere auch im Duden aufgeführt sind, also zum Beispiel der Computer oder das Web, ist eine Auszeichnung oder Umbenennung nicht erforderlich."

Short translation (which is for some terms difficult because they are not exactly defined)- : WCAG allows exceptions for proper names, terms, ...,, terms or sentences which are part of the vernacular of the immediately surrounding text. It is the perspective of the working group (kerstinp: the working group for BITV 2.0) that proper names or terms needs either markup or must be renamed. Terms which are in the common parlance and which are especially in the Duden (with "Duden" a specific dictionary of the publishing house duden is meant) markup is not necessary.


The term "Jargon" in the official translation is in my opinion a very good generic term for what is meant. "Jargon" covers as a generic term terminologies of disciplines and professions as well as sociolects (like web jargon, computer jargon, and even sociolects like "manisch language" which is spoken from some people in my area and includes many single words from different languages like romani (not rumanian), yiddish, german and other languages).

Please find a (sorry) german explanation for this generic term for example in Wikipedia.

The main thing is not about this part of the SC but about "The human language of each passage or phrase in the content" and what is meant by "phrase".


All this seems to be very theoretical, linguistically, philologically, perhaps philosophically, blast from the past and so on. I don't want to judge decisions from the german past or any past. The old BITV is no longer mandatory, instead also in Germany EN 301 549 V2.1.2 is mandatory with chapter 9 which is identical with WCAG.

The response which I quoted above (phrase -> group of words definition) in my first comment complies with the perspective of many people and more important: especially with the perspective of many screen reader users in my country. (and just for saying it: I don't know every opinion from every screen reader user just what user have said in conversations, statements, articles and so on)

As @yatil says

I think it is best to leave that to the language processors in Screen Readers to figure out. Now, if you have a longer phrase (“Bonjour mon ami”), I would require a language change.

Sorry again for the epical length. The specific situation in my country is specific.

Cheers

Kerstin

awkawk commented 4 years ago

@kerstinp I don't speak German, but what you describe for what jargon means in German sounds like what is in the SC as "technical terms". The term "vernacular" is more about the common use or colloquial use of words in a language, but not knowing German I don't know what the right word is for that.

yatil commented 4 years ago

I think @patrickhlauke hit it on the head with “Umgangssprache”.

KerstinProbiesch commented 4 years ago

All this is really not easy. And also not translating between german and english and back....We are dealing with Terms and words which are not exactly defined and standardized in a testable way in every language like "vernacular", "common use", "colloquial", "everyday language" or "Umgangssprache". Different people can have different "everyday language" or "Umgangssprache". We can see that very easily probably in every country with the regional dialects. All those 'terms' I believe are leading into confusion when used without any relation to something.

That's why I really like the wording of the exceptions. "proper names, technical terms...." and so on. And that "vernacular" was coupled with "immediately surrounding text" and not as a single term without any relation to anything. And think that "Jargon" is a proper german translation.

@awkawk "technical terms" can be part of of specific professional language and part of "Jargon". But Jargon is much more than 'just' using technical terms. There is also an article about "Jargon" in the english Wikipedia.

But not the exceptions or their wording are important for the main topic of this thread. It is the first half and what "phrase" means and that in 2008 no definition was provided. (I feel it is important not losing the focus on the main topic of this thread).

awkawk commented 4 years ago

@kerstinp Agreed. In 2008 the term wasn't defined, so we need to go with the standard definition of "phrase" and my opinion (which may differ from the working group) is that based on definitions that I see online and based on the wording of the SC and the understanding document I regard this to mean "one or more words".

KerstinProbiesch commented 4 years ago

@awkawk There is no standard definition of "phrase". Different linguistic / grammar theories different opinions on wether a single word qualifies as "phrase".

Edit: The english and also the german Wikipedia provides an overview. The english Wikipedia starts with:

In everyday speech, a phrase is any group of words, often carrying a special idiomatic meaning; in this sense it is synonymous with expression. In linguistic analysis, a phrase is a group of words (or possibly a single word) that functions as a constituent in the syntax of a sentence, a single unit within a grammatical hierarchy.

mraccess77 commented 4 years ago

I agree the term was meant as one or more words. In English a word can be complete sentence, e.g. "Stop!" or "Run!".

KerstinProbiesch commented 4 years ago

@mraccess77 “sentence” and “phrase” are not different words for the same thing.

-- @all

In case, the SC would say “the human language of each word, passage, sentence or phrase … except” or “the human language of each passage or sentence” it would be easy. But this is not what the SC says.

Linguistic theories and opinions not only differ in what a “phrase” is. Linguistic theories don’t have every day speech in mind and also not foreign language in text but “units within a grammatical theory”. Some theories and opinions from linguistics are mentioned in the English Wikipedia like G. Finch who teaches English language and has written several books on linguistics. In “Linguistic terms and Concepts” – according to the reference in English Wikipedia – on page 112 is written that a phrase consists

“of two or more words: individual words do not count as phrases.”

Beside linguistic theories and interpretation of specific terms: In many languages the problem of foreign words is not that crucial or might not even be a problem. In some it is. What my simple demo text with just around 100 words and just for English in German text (see above) has shown:

Depending on the voice / screen reader an English word is already pronounced correctly. The lang-Attribute would be work for nothing and would not solve a problem. And – even more crucial - depending on the voice / screenreader a word is already pronounced correctly without lang-attribute and the additional lang-attribute causes disturbing in the reading flow. Depending on the voice / screen reader no problem is solved with the lang-attribute but a problem is caused. I don’t think that one can leave screen reader beside when this is an SC which should support screen reader user.

And I see a problem when not deciding for what “phrase” is in every day speech but follow what some (not all) linguistics are saying (one-or-more-words). What would then be the testing procedure when relying on some linguistic theories in interpreting this SC? When the definition of this SC would follow some linguistic theories on “units within a grammatical theory” than I would think that this must be part of testing procedures to find out whether something is a WCAG-Fail or not.

Whatever the WG decides: It should made be much more clear how to use and evaluate the lang-attribute especially in the context of the normative exceptions written in the second half of this SC.

cstrobbe commented 1 year ago

None of the options A, B and C match the success criterion's meaning because of the exception "proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text" (see Andrew Kirkpatrick's comment from 23 June 2020).

The question whether "phrase" (in "each passage or phrase" at the start of the SC) includes single words can be approached in two ways.

The first approach is by inference from the exception, which includes "technical terms" and "words of indeterminate language". If single words can't be phrases, they can't be exceptions to the success criterion's main rule, which applies to "each passage or phrase".

The second approach is based on linguistics. Language and Linguistics: The Key Concepts by R. L. Trask (second edition, 2007) defines "phrase" as "a grammatical unit which is smaller than a clause". This is rather vague, but the same entry also lists the most important types of phrases: noun phrase, verb phrase, adjective phrase and prepositional phrase. It also points out that the sentence "Susie smiled" consists of two phrases: the noun phrase "Susie" and the verb phrase "smiled". It is clear from this that a phrase can be a single word. I also checked the definition of "Phrase" in Lexikon der Sprachwissenschaft, edited by Hadumod Bußmann (4th ed., 2008) and Metzler Lexikon Sprache, edited by Helmut Glück (2nd ed., 2000), but the entries in these reference works don't discuss the number of words that can make up a phrase.

Another issue with excluding single words from the definition of phrase (in discussions of SC 3.1.2 rather than discussions among linguists) is that there is no consensus in linguistics on the definition of the term "word". Even though most people have some sort of intuitive grasp of the concept, there are several different ways to look at it from a linguistic point of view, for example (again from Trask's Language and Linguistics),

"Orthographic word" would not be a suitable definition of "word" in WCAG, which has to work across languages, because some languages don't use white space between the lexical items or grammatical word forms in a sentence. For example, in Standard Chinese "他们的最好。" is not a single word but a sentence (Tāmende zuì hǎo. Theirs [is] best). 他 (he), 的 (possessive particle), 最 (most) and 好 (good) are lexical items; 他们 (they) is also a lexical items. Even though 他们的 means "theirs", it can be analysed as two words. (You won't find it in a dictionary; you need to infer its meaning from 他们 and 的.)

The distinction between "word" and "phrase" becomes rather blurry when you look at agglutinative languages such as Turkish and Finnish. For example, in Turkish evlerimde means "in my houses": ev means "house", evler means "houses", evlerim means "my houses" ("im" being added at the end is agglutination) and evlerimde finally adds "de", a suffix for the locative case. Its English translation is a phrase, it is an orthographic "word", but it is not a lexical word. Due to its meaning, it would not make sense to deny it is a phrase. The article Agglutination - Examples of Agglutinative Languages gives a much more extreme example: "Çekoslovakyalılaştıramadıklarımızdanmışçasına", which is pronounced in one word in Turkish, means "as if you were one of those whom we could not make resemble the Czechoslovakian people".

(Combining the two characteristics discussed above, Japanese is an agglutinative language that doesn't use white space between "words".)

So if "word" were excluded from "phrase" in SC 3.1.2, I don't know how anyone would make sense of the success criterion in a linguistically meaningful way.

However, if we say that "phrase" in "each passage or phrase" applies to phrases and words indiscriminately without defining "word", it will no longer matter that the content in the elements <span lang="zh-Hans">他们的最好</span> and <span lang="tr">evlerimde</span> looks like a word to a lay person and a phrase to a linguist and we have covered all language changes that aren't in the list of exceptions at the end of the SC.

Based on this, we may want to clarify "phrase" in the WCAG glossary or in Understanding WCAG 2.1 to point out that a single word can also be a phrase. After all, that seems to work for linguists in spite of the ambiguity of "word".


Kerstin Probiesch wrote on 24 June 2020:

It may be that there is something like a vernacular in other languages. We don't have a standardized german / vernacular and no official institutions for that.

German is not the only language that has no official regulator. English and many other languages don't have one either. For this reason, the question which dictionary should be consulted to find out whether a word from a foreign language has become part of the language or vernacular of the surrounding texts probably affects most languages. Note that the term "vernacular" refers to "the ordinary, everyday speech of a particular community" and "is commonly contrasted with standard language" (Trask: Language and Linguistics: The Key Concepts). (So "standard vernacular" is a contradiction in terms.) Because of this, "vernacular" may seem like a poor choice of words, but not every language used on the web has regularly updated dictionaries and the SC should work for those languages as well. So if you know that a specific foreign word has become common in your native language but no dictionary mentions it yet, you can still make that exception for "words or phrases that have become part of the vernacular of the immediately surrounding text".

Kerstin Probiesch wrote on 25 June 2020:

The term "Jargon" in the official translation is in my opinion a very good generic term for what is meant. "Jargon" covers as a generic term terminologies of disciplines and professions as well as sociolects (like web jargon, computer jargon, and even sociolects like "manisch language" which is spoken from some people in my area and includes many single words from different languages like romani (not rumanian), yiddish, german and other languages).

Note that "jargon" specifically refers to terminology, i.e. a lexical characteristic of language, whereas "sociolect" refers to a form of non-standard language that may differ from standard language not only from a lexical point of view, but also from a phonological, morphological, syntactical and semantic point of view. For this reason, "jargon" does not include sociolects.

Kerstin Probiesch wrote on 25 June 2020:

There is no standard definition of "phrase". Different linguistic / grammar theories different opinions on wether a single word qualifies as "phrase".

It is perfectly OK for WCAG to say that "phrase", in this specific standard, can refer to one or more words. If not all linguists agree with it, at least it will be acceptable to many others.

Kerstin Probiesch wrote on 29 June 2020:

Linguistic theories and opinions not only differ in what a “phrase” is.

I wonder whether there is confusion here between "[major] constituent" (German: Satzglied) on the one hand and "phrase" (German: Phrase) on the other. The definition and classifcation of constituents depends on the theoretical concepts in a specific theory of syntax (Hadumod Bußmann, ed.: Lexikon der Sprachwissenschaft. 4th ed, 2008; this is also something I remember from the general linguistics courses at university). Some theories of syntax attach great importance to syntactic functions, whereas others don't (Jürgen Pafel: Einführung in die Syntax. Grundlagen — Strukturen — Theorien. 2011, page 39). Some ways of defining constituents work for one language but not for other languages (Helmut Glück, ed.: Metzler Lexikon Sprache. 2nd ed, 2000). Just like phrases, constituents can consist of one or more words. (This is especially common in languages that don't have definite and indefinite articles, such as Standard Chinese.) Constituents can also be phrases; for example subject and object (which are types of constituents) tend to be noun phrases. I have not been able to find evidence for the claim that linguistic theories differ in what a phrase is, so I would be grateful if Kerstin Probiesch could provide examples from linguistics textbooks or reference works.

( Kerstin Probiesch wrote on 29 June 2020:

Linguistic theories don’t have every day speech in mind (...).

Larry Trask's Language and Linguistics (entry on "Vernacular":

Interest in vernacular forms developed only slowly during the twentieth century, but it became increasingly prominent with the rise of sociolinguistics in the 1960s.

In the 1960s, linguists developed a much greater interest in spoken language than before. (See the entries "Gesprochene Sprache" and "Gesprochene Sprachform" in Glück and the entry "Gesprochene Sprache in Bußmann.)

)


Based on the above, it would be a good idea to define "phrase" so it includes phrases consisting of a single word. A definition of "phrase" that insists on at least two words will not work well for languages where noun phrases, verb phrases etc. often consist of a single word.

CharlesBelov commented 1 year ago

There is the case of acronyms which are intended to be pronounced as words. For example, People Organizing to Demand Environmental and Economic Justice (PODER). If, on an English-language page, a screen reader reads PODER as spelled out individual letters, we would want it to use English pronunciation: pee-oh-dee-ee-are, not Spanish. But if the screen reader reads PODER as a word, we would want it to use Spanish pronunciation, po-DAIR, not English, PODE-ur. There is a potential for a need for new HTML to support this, but what would be the WCAG guideline for this? Lest I hijack the thread, do I need to file this question as a separate issue?

alastc commented 1 year ago

We closed #297 yesterday, but I think we should leave this one open as a hook for a further update to the understanding doc. E.g. take some of the comments above and add some examples of which words would be in / out of scope.