w3c / wcag21

Repository used during WCAG 2.1 development. New issues, Technique ideas, and comments should be filed at the WCAG repository at https://github.com/w3c/wcag.
https://w3c.github.io/wcag/21/guidelines/
Other
140 stars 55 forks source link

Plain language (Minimum) #30

Closed lseeman closed 7 years ago

lseeman commented 7 years ago

Current versions of SC and Definitions

joshueoconnor commented 7 years ago

Assigned to Jim Smith (@jim-work) https://www.w3.org/WAI/GL/wiki/SC_Managers_Phase1

joshueoconnor commented 7 years ago

@jim-work Is there a PR ready to go for this?

joshueoconnor commented 7 years ago

Pull request https://github.com/w3c/wcag21/pull/106

jspellman commented 7 years ago

This is difficult to measure and to implement. I recommend looking at using reading level. It isn't perfect, but it addresses most of the user needs identified, especially when paired with existing Technique G153. Reading level has international support, it has automated tests, and it has a variety of formulas (Flesh-Kincaid is the oldest and best know, there are many others like the Dale-Chall list, which provides 3000 common simple words.)

Proposed revision: Understandable Labels: Navigation elements and form labels do not require reading ability greater than primary education level. (A) [link to WCAG’s definition of primary education level from UNESCO standard] Techniques should include links to the Dale-Chall list.)

mbgower commented 7 years ago

Short Text

and error messages, which require a response to continue

I’m curious about this qualification. It suggests that an error message that doesn’t require a response is an exception. What is the rationale for this?

(See exceptions for different context and language.)

I don’t think it is necessary to have this ‘see exceptions’ text here. Perhaps it's an editorial comment for the review?

Concrete language: Non-literal language is not used, or can be automatically replaced, via an easy-to-set user setting. All meaning must be retained when non-literal text is replaced.

Is there a reason to describe what you don’t want instead of what you do want? How about “Words are used according to their proper meanings or definitions. Metaphors and figurative language are not used unless they can be automatically replaced with literal text based on user settings.”

The words on controls and labels identify an element's function.

G131 already covers this to meet both 2.4.6 Headings and Labels and 3.3.2 Labels or Instructions. I do not see the point of regurgitating this.

Also on instructions: Each step in instructions is identified, and literal wording is used.

As with the control material, I question whether this isn’t already covered elsewhere – or if it could be incorporated into current SC without significant impact. As well, I would argue that regardless of the potential to work any new requirement into 2.4.6, this also seems to be fully covered by the criteria you are proposing in Task Completion. There should be some level of normalization between techniques – ideally only one technique should capture an issue, not many.

easily available

Not sure why this SC is defining "easily available" since it is not used anywhere in the text except listed as a possible technique


Testability

Tense and voice are objective, and hence are verifiable. (It is expected that natural language processing algorithms will be able to conform to this automatically with reasonable accuracy.)

I don’t think it would be accurate enough that one would be able to automatically indicate a Violation from an automated test. I suspect such triggered failures would need to be Potential Violations, still subject to human review. I also question many web-content creator’s abilities to fully understand tense and voice.

Testing content against these word lists

There are many word lists. Which one is the one someone needs to test against? Which one is the one that will fail? We know that any measure that can be disputed will be, and that it can lead to risk. No solution offered, but this will be a hard sell. Also, are “frequently used” words 100% correlated with “clear” or “simple” words?


Techniques

Failure of SC 3.1.x due to replacing words with pronouns, as this decreases clarity; or Failure of SC 3.1.x due to requiring users to learn new terms or new meanings for terms or symbols; or

Troubled that there are techniques listed for pronouns and symbols, when neither are mentioned in this SC.

lseeman commented 7 years ago

agree with @mbgower on the wording changes. Small change:

Words are used according to their common meanings or definitions. Metaphors and figurative language are not used unless they can be automatically replaced with literal text based on user settings

@jspellman this has been discussed. the reading age thing does not work in this context and makes it easier to read but nt understand. It will not solve the cases and examples brought in the discription. This may require accessibility experts aquiring new skills and buying new tools. I feel that is OK

lseeman commented 7 years ago

full new proposal

Plain language: Provide clear and simple language in instructions, labels, navigational elements, and error messages, which require a response to continue, so that all of the following are true.

For instructions, labels, and navigational elements:

Simple tense: Use present tense and active voice. 
Simple, clear, and common words: Use words or phrases that are most-frequently used for the current context, unless it will result in a loss of meaning or clarity. This includes not using abbreviations, words, or phrases, unless they are the common form to refer to concepts for beginners. Where word frequencies are known for the context, they can be used.
Double negatives are not used.
Concrete language: Words are used according to their common meanings or definitions. Metaphors and figurative language are not used unless they can be automatically replaced with literal text based on user settings

Also on controls:

The words on controls and labels identify an element's function.

Also on instructions:

Each step in instructions is identified, and literal wording is used.

Exceptions:

When a passive voice or a tense (other than present tense) is clearer. Other voices or tenses may be used when it has been shown, via user testing, to be easier to understand, friendlier, or appropriate.
In languages where present tense and active voice do not exist, or are not clearer in the language of the content, use the tense and the voice that are clearest for the content.
When describing or discussing past or future events, the present tense is not required.
If the writing style is an essential part of the main function of the site, such as a game, a literary work, or teaching new terms.
Where less-common words are found to be easier to understand for the audience. Such findings are supported by user testing that includes users with cognitive disabilities.
The writing-style items may be replaced for a location or a type of content in which user testing has shown a more-effective writing style to aid comprehension for people with cognitive disabilities. Example: content written in a specific natural language.
The content will be penalized for not conforming to a given writing style (such as a CV, dissertation, or Ph.D. proposal).
mbgower commented 7 years ago

How can you close this? You've barely responded to any of the points raised.

mbgower commented 7 years ago

Words are used according to their common meanings or definitions.

You now have "common" used in two separate bullets. I think I get why you would not want to use "proper" but it creates overlap to use "common" twice. As well, many metaphorical uses of words are common uses. I suggest using "literal" instead, or finding a better alternative.

mbgower commented 7 years ago

This may require accessibility experts aquiring new skills. I feel that is OK.

I believe this is your response to Jean's "This is difficult to measure and to implement" comment?

Without going into a discourse, I'll just say that many of the COGA candidates are not resolvable at the stage where most accessibility scrutiny currently takes place. Trying to improve and broaden the accessibility of content is our key goal, but let's not downplay the challenges posed by some of the candidates in their present form. A whole lot more than accessibility experts will need to acquire new skills to achieve and verify ones such as this.

detlevhfischer commented 7 years ago

I think there is a fundamental issue with all the plain language success criteria that have been proposed which has nothing to do with the availability (or lack of) of automated tools. Instructions: I have great concerns that mandating active voice is far too rigid. Take these examples that use the passive voice in (hypothetical) instruction type texts:

The good thing about using passive voice here is that the main person (you, the one receiving the advice) is clearly in focus grammatically as being the subject of the sentence. Of course you can rephrase these examples by introducing the implied subjects of the action ("If a dog bites you, see a doctor immediately.") but arguably this does not improve and may in many cases complicate the sentence.

Take another example that could appear in an instructional text:

Turning this example into active voice "Scientists do not know much about this disease" arguably makes the sentence more complex because it forces the appearance of a subject that is not helpful - the same might be true not just for scientists but also for doctors, policy makers etc.

So what is the problem? Testers identifying passive voice may feel encouraged to fail content if they are not sure that one of the exceptions applies.

Also:

Controls: In yesterday's telco, I have emphasized that we already have two 2.0 success criteria that together should cover what plain language applied to controls is supposed to achieve for sighted users (another couple of SCs, 1.1.1 and 4.1.2, mainly adress non-visual use cases):

I think we agreed in yesterday's telco that controls might therefore be taken out of the scope of this SC because of that, but that is up to the COGa TF to decide.

Detlev

lseeman commented 7 years ago

@mbgower The pull request was made before the most of the comments were made. We were just late closing the thread and then the discussion started again. The discussion is meant to move to the pull request - at least I think so. TO be honest I find it all quite confusing.

lseeman commented 7 years ago

@joshueoconnor @marcjohlic @detlevhfischer I would like to continue discing this on the lisr and on the coga call tomorow. You are invited to join us, so we can work out the right wording.

mbgower commented 7 years ago

@lseeman

The pull request was made before the most of the comments were made. We were just late closing the thread and then the discussion started again.

Looking over the thread, it looks like the pull request was made before there was any discussion. I don't see the value of trying to move something forward without any vetting. There seems to be an element of panic going on in trying to push the COGA candidates through, sort of an 'Anything is better than nothing' attitude. That's a false dichotomy. Scrutiny and conversation will advance and refine the issues towards incorporation. Trying to move draft proposals in bulk to the next stage without addressing contentious and unresolved issues ("kicking the can down the road") is not a good strategy.

The discussion is meant to move to the pull request

It looks like the pull request has also been closed. Where are we supposed to post questions and issues for this topic now?

TO be honest I find it all quite confusing.

I share some of that confusion. But I don't think either of us give up easily, so let's adapt the system for ourselves to make it work.

detlevhfischer commented 7 years ago

I feel I have said all what I had to say. For me, this SC really seems to be at the level of 'best practice' and not at the level where some tester will fail content because he or she thinks that a word is uncommon or there is a passive clause which might better be turned into an active clause. I find this too intrusive and too rigid. Having said that, I appreciate the minimum variant of this SC is constrained to instructions, and I agree these should be as plain as possible. I think I imagine being an author and feeling upset at being shackled by prescriptions that may not do justice to my particular task at hand - so I admit this is partly a gut reaction and as such, something to be put in perspective by others and by different requirements.

Detlev

Sent from phone

Am 19.02.2017 um 08:49 schrieb Lisa Seeman notifications@github.com:

@joshueoconnor @marcjohlic @detlevhfischer I would like to continue discing this on the lisr and on the coga call tomorow. You are invited to join us, so we can work out the right wording.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

detlevhfischer commented 7 years ago

Context-specific constraint to 1500 most common words: I was just evaluating a site which has a question and answer section on a life-threatening disease. As evaluator of 'Plain Language (minimum)', I would have to decide whether this section falls under instructions. (It probably does, but as evaluator I would be sorely tempted to consider it out of scope.) So assuming that I treat it as 'instruction': The context of the disease means that all sorts of medical terms come into play - there is no chance of dealing adequately with the concerns of patients looking for answers within a specific set of 1.500 words, or simplify these terms in a non-confusing way. Would I as an evaluator be entitled to call upon the exception Where less-common words are found to be easier to understand for the audience. Such findings are supported by user testing that includes users with cognitive disabilities Possibly, even though these terms are far from easy to understand. They are just necessary to map onto the diagnoses people will have received. But user testing won't be available to me as evaluator (and is ruled out if going by the Cfc regarding this issue). Just one example to show what kind of issues we get into if this were to become an AA SC. Detlev

Sent from phone

Am 19.02.2017 um 17:06 schrieb Mike Gower notifications@github.com:

The pull request was made before the most of the comments were made. We were just late closing the thread and then the discussion started again.

Looking over the thread, it looks like the pull request was made before there was any discussion. I don't see the value of trying to move something forward without any vetting. There seems to be an element of panic going on in trying to push the COGA candidates through, sort of an 'Anything is better than nothing' attitude. That's a false dichotomy. Scrutiny and conversation will advance and refine the issues towards incorporation. Trying to move draft proposals in bulk to the next stage without addressing contentious and unresolved issues ("kicking the can down the road") is not a good strategy.

The discussion is meant to move to the pull request

It looks like the pull request has also been closed. Where are we supposed to post questions and issues for this topic now?

TO be honest I find it all quite confusing.

I share some of that confusion. But I don't think either of us give up easily, so let's adapt the system for ourselves to make it work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

lseeman commented 7 years ago

@Detlev This is a good example and has been addressed in a few ways

  1. in the SC itself you can use the common way to refer to a concept in this context. So the medical terms would be fine if they qualify. We are anticipating the tools that will be able to generate the word list (It would take me about a week to program that one) but, just incase we do not have it by the time we get to SC we were excluding instructions of over 300 words until there are adequate tools.

  2. You can use what ever words you want and put the simple language in the title, or the coga-easylang etc. An easy to access glossary could also be an acceptable technique.

Between the two I think we have it more then covered.

All the best

Lisa Seeman

LinkedIn, Twitter

---- On Mon, 20 Feb 2017 00:26:06 +0200 Detlev Fischer <notifications@github.com> wrote ----

Context-specific constraint to 1500 most common words: I was just evaluating a site which has a question and answer section on a life-threatening disease. As evaluator of 'Plain Language (minimum)', I would have to decide whether this section falls under instructions. (It probably does, but as evaluator I would be sorely tempted to consider it out of scope.) So assuming that I treat it as 'instruction': The context of the disease means that all sorts of medical terms come into play - there is no chance of dealing adequately with the concerns of patients looking for answers within a specific set of 1.500 words, or simplify these terms in a non-confusing way. Would I as an evaluator be entitled to call upon the exception Where less-common words are found to be easier to understand for the audience. Such findings are supported by user testing that includes users with cognitive disabilities Possibly, even though these terms are far from easy to understand. They are just necessary to map onto the diagnoses people will have received. But user testing won't be available to me as evaluator (and is ruled out if going by the Cfc regarding this issue). Just one example to show what kind of issues we get into if this were to become an AA SC. Detlev

Sent from phone

> Am 19.02.2017 um 17:06 schrieb Mike Gower <notifications@github.com>: > > The pull request was made before the most of the comments were made. We were just late closing the thread and then the discussion started again. > > Looking over the thread, it looks like the pull request was made before there was any discussion. I don't see the value of trying to move something forward without any vetting. There seems to be an element of panic going on in trying to push the COGA candidates through, sort of an 'Anything is better than nothing' attitude. That's a false dichotomy. Scrutiny and conversation will advance and refine the issues towards incorporation. Trying to move draft proposals in bulk to the next stage without addressing contentious and unresolved issues ("kicking the can down the road") is not a good strategy. > > The discussion is meant to move to the pull request > > It looks like the pull request has also been closed. Where are we supposed to post questions and issues for this topic now? > > TO be honest I find it all quite confusing. > > I share some of that confusion. But I don't think either of us give up easily, so let's adapt the system for ourselves to make it work. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mbgower commented 7 years ago

@lseeman, now that you've introduced the idea of a 1500 word list as a key measure, I'd like to get back to @jspellman's suggestion about using something like Dale-Chall reading level as at least a partial solution there. Here are your starting bullets:

  • Simple tense: Use present tense and active voice. (See exceptions for different context and language.)
  • Simple, clear, and common words: Use words or phrases that are most-frequently used for the current context, unless it will result in a loss of meaning or clarity. This includes not using abbreviations, words, or phrases, unless they are the common form to refer to concepts for beginners. Where word frequencies are known for the context, they can be used.
  • Double negatives are not used.
  • Concrete language: Non-literal language is not used, or can be automatically replaced, via an easy-to-set user setting. All meaning must be retained when non-literal text is replaced.

What would happen if you retained the double-negatives and (reworded) concrete language, and swapped out the reading level in place of the "simple, common" words, to be something like this:

  • Simple words: Words do not require reading ability greater than primary education level.
  • Double negatives are not used.
  • Concrete language: Words are used according to their literal meanings or definitions. Non-literal language is avoided.

You'll note that I've removed "clear" and information on present tense and active voice. I believe they have had enough feedback that they can be addressed in the Understanding document without forming part of the starting language. If you think clarity is a crucial measure that is testable, attention can be focused on helping solve that item specifically. You can also work your idea of "common" into exception language based on the context.

I have removed the following part, as I would argue it is overly prescriptive. This can be introduced as a technique, involving personalization.

or can be automatically replaced, via an easy-to-set user setting. All meaning must be retained when non-literal text is replaced.

Still lots of space for wordsmithing, but I think these three items do go a ways to achieving the goal.

mbgower commented 7 years ago

In regard to your starting text

Provide clear and simple language in instructions, labels, navigational elements, and error messages, which require a response to continue, so that all of the following are true.

I would still like to understand the rationale for appending "which require a response to continue". What scenario are you avoiding? However, if you feel it is necessary, at the least the comma should be removed after "messages" so that the phrase is clearly qualifying error messages and not the rest of the sentence.

I think there has been enough discussion about the relevance of existing SC language that you do not need the additional specific items for "also on controls" and "also on instructions".

lseeman commented 7 years ago

@mbgower it precludes logs, and warning etc. It makes it only stuff that stops the user from continuing. In other words, it reduces the scope towards the absolute essential stuff that you need to use an app or website. Considering the resistance it should be clear why we need to do that.

Latest draft has got rid of the also on controls, and has intergrated in instuctions are clearly identified. I have no idea where the latest draft is meant to sit but have asked the chairs for guidance on that.

mbgower commented 7 years ago

Okay, so by "response" you mean acknowledgement, like having to click "okay", etc. Removing the comma will help with that, then. Thanks.

lseeman commented 7 years ago

I think this is the currently proposed wording for Plain language. We are still working on a good definition of " concept in the current context" or we will replace that term

Plain language: Provide clear and simple language in instructions, labels, navigational elements, and error messages, which require a response to continue, so that all of the following are true.

Simple tense: Use present tense and active voice.
Simple, clear, and common words: Use the most common 1500 words or phrases or, provide words, phrases or abbreviations that are the are most-common form to refer to the concept in the current context. Double negatives are not used. Concrete language: Non-literal language is not used, or can be automatically replaced, via an easy-to-set user setting. All meaning must be retained when non-literal text is replaced. Instructions: Each step in instructions is identified.

Exceptions: If there are no tools available in the language of the content that identify uncommon words, instructions that are longer then 400 words are exempt unless they directly relate to a critical service When a passive voice or a tense (other than present tense) is clearer. Other voices or tenses may be used when it has been shown, via user testing, to be easier to understand, friendlier, or appropriate. In languages where present tense and active voice do not exist, or are not clearer in the language of the content, use the tense and the voice that are clearest for the content. When describing or discussing past or future events, the present tense is not required. If the writing style is an essential part of the main function of the site, such as a game, a literary work, or teaching new terms. Where less-common words are found to be easier to understand for the audience. Such findings are supported by user testing that includes users with cognitive disabilities. The writing-style items may be replaced for a location or a type of content in which user testing has shown a more-effective writing style to aid comprehension for people with cognitive disabilities. Example: content written in a specific natural language. The content will be penalized for not conforming to a given writing style (such as a CV, dissertation, or Ph.D. proposal).

mbgower commented 7 years ago

@lseeman, there have been lots of comments pushing back about this 1500 word idea here and in the list. What about my prior comments on reducing the bullets and working in reading level? With investigation, I suspect a measure can be found that can address localization, such as materials from UNESCO. Continue to be concerned you are not addressing all individual points and not pursuing a reductionist strategy which seeks to simplify the SC down to its essence (as opposed to expanding it to encompass even more).

lseeman commented 7 years ago

reading level again and again gets rejected by the taskforce. It does not help these usecases

lseeman commented 7 years ago

Current wording Plain language: Provide clear and simple language in instructions, labels, navigational elements, and error messages which require a response to continue, so that all of the following are true. -Simple tense: Use present tense and active voice. -Simple, clear, and common words: Use the most common 1500 words or phrases or, provide words, phrases or abbreviations that are the most-common form to refer to the concept in the identified context. -Double negatives are not used. -Concrete language: Non-literal language is not used, or can be automatically replaced, via an easy-to-set user setting. All meaning must be retained when non-literal text is replaced. -Instructions: Each step in instructions is identified

New definition

identified context is defined as - context and a context specific word frequency list (and glossary) has been identified in an accessibility statement or other known technique. A word frequency list has to be generated from at least 1000 sources from the same context.

new exception: If there are no tools available in the language of the content that identify uncommon words, instructions that are longer then 400 words are exempt unless they directly relate to a critical service.

mbgower commented 7 years ago

reading level again and again gets rejected by the taskforce. It does not help these usecases

Then why is it listed in your documentation as a goal? Under 3.7.6.5 Readability and Language in your User Research paper, it is the fifth point

Maintain a reading level that is adequate for the audience.

I understand completely that reading level alone cannot satisfy this criteria. I was not proposing that. But since we are struggling at something measurable and scalable, it seems a valid way of quantifying simpler words, which is one of the things you are trying to define. Is it less effective than a list of 1500 words? This isn't an idle question. For instance, I couldn't easily find stats on how many words each reading level is regarded as matching. If the scale is way off -- say primary reading levels are already double or triple the 1500 words you're targetting -- then that's useful information. As others have asked, what research do you have to back up 1500 words?

lseeman commented 7 years ago

Hi Mike Looking up and citing research is really time consuming. No one on the taskforce likes reading age. it is only coming from outside the taskforce. Debbie Dahl did extensive review of the algorithms and reaserch on use of reading age for COGA and found them not useful.

Many coga users have limited vocabulary - whether it is 200 words or 2000. (see http://leader.pubs.asha.org/article.aspx?articleid=2289630, or The words they know are likely , of course to be the more common words. so the higher the vocabulary the less people can understand it. that much is completely clear. The recommended amount by many places was 1000 -1500, such as voice of america, These include Basic English (850 words), Special English (1500) globish (1000) etc. 1500 seemed to be the higher number and a compromise. i had all the sources open two days ago. i need to find them again. The use of the 1500 words is this is the higher number of words with which you can express most things, . If you look at the most frequent 3000 words you will find most of them can be expressed more simply and are not in the vocabulary of many people with intellectual disabilities. For words that are subject specific for which we have the "common form to refer to a concept" option or you can provide a simple langue explanation in the title tag or other mechanisim

johnfoliot commented 7 years ago

Hi Lisa, COGA TF,

A word frequency list has to be generated from at least 1000 sources from the same context or how ever many pages can reasonably be found

If there are no tools available in the language of the content that identify uncommon words

All meaning must be retained when non-literal text is replaced.

Editorially, I am concerned about this one. Often, in prose, idioms or non-literal phrases are used to evoke an emotional response. For example, the expression "Catch you later" is non-literal, and it's use is both figurative but also conveys a sense of both familiarity and an existing relationship. Replacing that expression with the more specific "I will see you again later" will be 'easier' to understand, but may not capture "all" the meaning.

I would suggest a small editorial change from "All" to "Intended meaning must be retained..."

​JF​

On Mon, Feb 20, 2017 at 2:54 PM, Lisa Seeman notifications@github.com wrote:

Current wording Plain language: Provide clear and simple language in instructions, labels, navigational elements, and error messages which require a response to continue, so that all of the following are true. -Simple tense: Use present tense and active voice. -Simple, clear, and common words: Use the most common 1500 words or phrases or, provide words, phrases or abbreviations that are the most-common form to refer to the concept in the identified context. -Double negatives are not used. -Concrete language: Non-literal language is not used, or can be automatically replaced, via an easy-to-set user setting. All meaning must be retained when non-literal text is replaced. -Instructions: Each step in instructions is identified

New definition

identified context is defined as - context and a context specific word frequency list (and glossary) has been identified in an accessibility statement or other known technique. A word frequency list has to be generated from at least 1000 sources from the same context or how ever many pages can reasonably be found

new exception: If there are no tools available in the language of the content that identify uncommon words, instructions that are longer then 400 words are exempt unless they directly relate to a critical service.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag21/issues/30#issuecomment-281179109, or mute the thread https://github.com/notifications/unsubscribe-auth/ABK-c5rIvO10qRiuALGvUUFJAUr66O5zks5ref2KgaJpZM4K9G9t .

-- John Foliot Principal Accessibility Strategist Deque Systems Inc. john.foliot@deque.com

Advancing the mission of digital accessibility and inclusion

cstrobbe commented 7 years ago

Frequency Lists

Frequency lists are already available for a number of languages. For example, those published by Routledge are typically based on a large corpus (at least 20 million words). (I have only used the frequency dictionary for Chinese). The Leipziger Universitätsverlag ("Leipzig University Press") has published frequency dictionaries for a number of smaller languages ("most frequent 1000 word forms ordered by frequency and the most frequent 10000 word forms in alphabetical order"); unfortunately, there is no information on the size of the corpora on which these lists are based.

Routledge also has frequency dictionaries for Arabic, Czech, French, Turkish, Korean and Persian.

I haven't been able to find good frequency dictionaries or lists (i.e. based on a decent corpus) for big languages such as Hindi, Bengali, Punjabi, Javanese, Malay, Telugu, Vietnamese, Marathi, Tamil, Urdu or Italian (each of which has more than 60 million native speakers). For example, the Italian English Frequency Dictionary - Essential Vocabulary: 2500 Most Used Words & 421 Most Common Verbs by J. L. Laide does not provide any information on whether it was based on a corpus. (The "Frequency Dictionary of Italian Words" by Alphonse Juilland dates from 1973 and is probably out of date.) The frequency lists listed on Wiktionary are based on very limited corpora (e.g. just subtitles). Lexiteria has frequency lists with the top 200 words for 40 languages; they may be able to produce longer lists on demand.

I'm not sure it's wise to try to build a frequency list without a basic knowledge of corpus linguistics; you need to know how to build a decent corpus (e.g. sufficient diversity and size, dealing with variations in spelling, dealing with conjugated and declined forms, etc.). If you want to try it anyway, you should know that scripts for word frequency have been written before. See e.g.

My comments focus on general frequency lists because the concept of "context" appears vague to me. In addition, I want to repeat John's question: what corpus size would be required to build a frequency list for a specific "context"? It appears that 2 million words would be considered a small corpus for general word frequency lists. (Hundreds of millions of words is the now the norm for major languages.)

Readability Formulas

With regard to the readability formulas mentioned by Jeanne Spellman, e.g. Flesh-Kincaid: as far as I know, readability formulas for English cannot be applied to other languages without adaptations. In addition, which formula do you choose? According to W. H. Dubay, there were around 200 readability formulas for English in the 1980s. If you pick a specific formula for English, do you need to adapt the same formula for other languges or can you pick a different one? If you pick different formulas for different languages, how do you make sure that you measure a similar readability level (e.g. in order to avoid creating heavier burdens for authors of content in other languages)?

mbgower commented 7 years ago

Thanks for the information, @cstrobbe. Here's a question I've asked before, and I'll pose it again: Is there a clear correlation between frequency and clarity/simplicity? I wouldn't have thought so. It seems to me there are three broad goals to achieve plain language:

I've been trying to tackle the third bullet (and to some degree the first) with the readability formulas. Lisa has been clear that she and the rest of the TF think grade level reading measurements can not achieve the goal of "simple" language. Which of these three bullets is achieved by the frequency list? I would have thought only "common words". Are any of these metrics likely to achieve more than one of these bullets, and which method is more likely to do it? The fact all three have been rolled up into one bullet in the SC language is not helping, I think.

cstrobbe commented 7 years ago

Additional comments:

  1. In "instructions, labels, navigational elements, and error messages, which require a response to continue", it is unclear whether "which require ..." applies only to "error messages" or to all elements in the enumeration. Would the following rewording work? "Error messages that require a response to continue, instructions, labels and navigational elements use clear and simple language that fulfils the following requirements: ..."

  2. On @lseeman's second version of the proposal: "Double negatives are not used." should be replaced by "Double negatives are only used to affirm a negative, not to express a positive statement." In English and a number of other languages, double negatives (e.g. "I ain't got no money") are considered ungrammatical. However, there are many languages where a double negative just affirms or even intensifies the negative statement. See negative concord and Linguistics Stack Exchange.

  3. On @lseeman's second version of the proposal: "Words are used according to their common meanings or definitions". Perhaps reword as follows: "Words are used according to their common literal meaning." (It is not clear what "definition" adds if you already have "meaning".)

  4. On @lseeman's second version of the proposal: Exception for the passive voice and tenses other than the present: Who is going to do the reseach and testing for the hundreds of languages that are used on the web (out of the 7,000 that are spoken in the world)?

  5. On @detlevhfischer's comment that SC 3.3.2 and SC 2.4.6 together should already cover plain language: I disagree. These two SCs require the present of labels and instructions and that this information describes their purpose. They do not enforce a specific level of clarity, let alone plain language. It is perfectly possible to describe the purpose of an input field in obscure language.

  6. On @lseeman's third (partial) version of the proposal: The exception for "instructions that are longer then 400 words" is difficult to apply in the absence of a generally agreed-upon defition of "word". Here are a few issues that complicate counting "words":

    • Does "town hall" count as one word or two? (Automated word counts will assume two words, unless they have a good lexicon.)
    • The German verb aufgeben is a single verb, but when it is conjugated, "auf" can sometimes by separated from the rest of the verb (e.g. "Er gab nicht auf.") Do you count the two parts as one word or two?
    • Some languages are called agglutinative. For example, in Turkish, "ev" means "(the) house", "evim" means "my house", "evimde" means "at my house" and "Evinizdeyim" means "I am at your house". How many words do you count in the last three examples? Due to these differences, the 400 word limit results in shorter or longer texts depending on the language of the website. Is that OK?
  7. On @mbgower question "Is there a clear correlation between frequency and clarity/simplicity?": I haven't read any literature about it, but word frequency alone seems insufficient; syntax also needs to be taken into account. Describing syntax is more complex than generating word frequency lists, and requiring the use of the active voice, the present tense and the avoidance of double negatives is a bit too simple.

mbgower commented 7 years ago

@cstrobbe

word frequency alone seems insufficient

I concur. Even within the 200 word frequency list from Lexeteria, the following come up:

I've tossed in some potentially simpler terms in parentheses. Not advocating those; just trying to illustrate a point. At 1500 words, there will be plenty of complex words and concepts occurring.

Simple, clear, and common words:

So, word frequency addresses "common". I await a better measure for "simple" than primary reading level. "Clear," as you note, is tough.

detlevhfischer commented 7 years ago

@cstrobbe The gist of Understanding 2.4.6 is that labels are descriptive in the sense of clear, understandable, helpful, meaningful (all these words occur there).

This can only be determined with the audience in mind. For a public general purpose site, I would argue that a descriptive but obscure term in a label would fail SC 2.4.6.

Having said that, if Plain Language (minimum) stays in and the application for controls is upheld (it's not in the current FPWD), this is not a big issue. I just think the requirement is redundant.

Detlev

Sent from phone

Am 22.02.2017 um 20:49 schrieb Christophe Strobbe notifications@github.com:

On @detlevhfischer's comment that SC 3.3.2 and SC 2.4.6 together should already cover plain language: I disagree. These two SCs require the present of labels and instructions and that this information describes their purpose. They do not enforce a specific level of clarity, let alone plain language. It is perfectly possible to describe the purpose of an input field in obscure language.

cstrobbe commented 7 years ago

@detlevhfischer SC 2.4.6 reads: "Headings and labels describe topic or purpose." Which part prohibits the use of obscure terms in labels, instructions, error messages, etc? The "gist of Understanding 2.4.6" is not normative. The plain language SC would be normative.

detlevhfischer commented 7 years ago

@cstrobbe I think we are going in circles now. 3.3.2 Labels or Instructions already demands that something has to be provided - 2.4.6 then demands that that something is descriptive, i.e. the user can work out what it means. I think Understanding 2.4.6 makes it quite clear that something obscure would not meet the requirement of being descriptive. Maybe we need to agree to differ on this point.

mbgower commented 7 years ago

My larger objection was the additional stuff called out:

Also on controls: The words on controls and labels identify an element's function.

That is clearly redundant with 2.4.6. I similarly pointed out concerns with the text specific to Instructions.

Also on instructions: Each step in instructions is identified, and literal wording is used.

I believe this has been addressed in the latest draft. In the original, the "literal" part is clearly redundant with the main short text which already calls for literal. Have a look at how the other phrase has been addressed now:

Each step in instructions is identified

There really would be no problem including text in the 2.4.6 Understanding document saying 'Each step in instructions is identified'. Yes, it wouldn't be part of the normative, but so what? There is a requirement to describe. We know we can capture issues under this SC. With this addition to Plain Language, folks would have to choose which SC to use. Result, added minor churn.

mbgower commented 7 years ago

@detlevhfischer and @cstrobbe, This discussion on labels is worth some elaboration on a few fronts. Prepare for a diatribe...

Normalizing

It is not helpful to create redundancy between Success Criteria. It causes confusion, adds churn/bloat and results in inconsistent reporting. Ideally, things should fail once in one location. That said, it is fine and proper to cross-reference other highly relevant considerations inside the Understandings doc, to ensure they are addressed.

The WG group decision not to allow changes in existing SC has resulted in some of this churn. I get that. But any candidates should still be heavily scrutinized against existing SC and against other proposed SCs by the same Task Force. More on this later.

Cumulative effect of SC and Understanding doc

3.3.2 Labels or Instructions has one clear measurable requirement: that a label or instructions be present. However, I have used it regularly to fail:

I have that leeway because they are all covered in the Understanding doc. Similarly, I have regularly (and more appropriately) failed unclear labels under 2.4.6 Headings and Lables. Despite it calling out only 'description' in the short text, the Understanding lays out an expectation for clear labels. I have had discussions on what constitutes clear; however, I cannot think of a situation where I did not get traction on flagging a consideration. (By the way, note the existing churn on deciding between 2.4.6 and 3.3.2, because the guidance is overlapping.)

Reality of current practices

Which part prohibits the use of obscure terms in labels, instructions, error messages, etc?

It is easy to get caught up in abstractions. The reality is that 2.4.6 can be used to address these, at least for labels and instructions. Navigation elements and error message? I'd need to see some examples to understand how they aren't considered labels or instructions, but focusing on these two 'elements' might be a tactical way of adding this new SC without redundancy. . We really have nothing to fix in terms of a SC where we can ail poor language for labels and instructions. Further, let's bear in mind that designers want clarity and ease of use. It is in everyone's interest to have clear labels and instructions. Let's only resort to the hammer when we need to.

Bringing this to bear on tactics for this SC and other COGA

Getting back to the original candidate, the main text for Plain Language captures labels and instructions. Yes, there's the overlap with the guidance in 2.4.6, but bringing it under the umbrella of a more exact measurement of plain language has potential value, if we can quantify these hard-to-measure terms.

However, when you dig in, you realize that the result of adopting this new SC in its updated form would be to render 2.4.6 Headings and Labels essentially redundant.

"Headings and labels describe topic or purpose" as a level AA requirement would already be met by this new level A SC, right? Personally, I think the fact we are positioning a single A to replace an existing AA is itself a flag that we have an issue. Such a realization should start some serious inquiry by the TF and WG.

What kind of inquiry? Scrutinize everything for redundancy and overlap. Have a look at what I did for Error Prevention](https://github.com/w3c/wcag21/issues/33#issuecomment-280397970) as a way this should be done.

On the one hand, focusing on errors and navigation may be a way of getting this in as a single A. On the other hand, how about focusing instead on the AA which calls for plain language for all content AND figuring out how this ties in with visions of personalization. Is it necessary that all text use plain language, or only that the user have the ability to transform it in that way?

I'm going on at this length because I think there are a lot of COGA SCs that introduce this minor churn, both against existing SCs and against each other. There is a bunch of overlap. Add it all up and it is adding confusion and overhead just within our process. This is in addition to the overhead from the sheer number of proposed new COGA SCs. It's not hard to understand what effect it would have out in the world.

detlevhfischer commented 7 years ago

Mike, I fully agree with what you say, and I am equally concerned that it will be even harder to decide which SCs should be called out as failure in cases such as having an obtuse, obscure or obnoxious label. It is already difficult with the overlap between 2.4.6 and 3.3.2 that you have noted. And for error messages, there is in addition 3.3.1 "Error Identification" which also calls for a description which I assume has to be intelligible (clear, understandable, etc.). So, close scrutiny of any overlaps between exisiting and new SCs will indeed be necessary. (There are similar overlaps in others, such as "Graphics Contrast” "User Interface Component Contrast (Minimum)” which cry out for a merger).

As stated before, I agree that controls are sufficiently covered by 3.3.2 and 2.4.6 which makes me think that Plain Language (minimum), should really focus on text content - e.g. the text in manuals, guides, health advice, and similar - i.e. on stuff that is neither control or label, nor error message. The line is admittedly not always easy to draw, but scoping it that way would make some sense since it is text content that is so far not covered by any A or AA SC. The numerous issues that have been pointed out regarding the cross-language validity and testability of the stipulations remain, but that is a different issue.

lseeman commented 7 years ago

@mbgower @detlevhfischer I think your discusion addresses a diffrent use case.

           As we have explained on the call the existing success criteria do not address the needs for  people with a smaller vocabulary and sever language disabilities.  I believe it is important to include in this scope as many people as possible. Please read the benefits etc for the use cases . These are not addressed by the other success criteria. Hence the "overlapping" SC do not address a huge number of disabled people. Please read the benefit section such as the "mode " example
lseeman commented 7 years ago

@mbgower we have been asked not to worry about potential redundency at this point.

           (My 2 cents - The redundancy issue is why we should be allowed to change sc's . It is not a good reason to not address accessibility as well as we can for all users)
awkawk commented 7 years ago

Updated the issue description to reflect the FPWD text and reopening issue.

CharlesBelov commented 7 years ago

Similar to logos being exempt from color constrast requirements, I would expect product and brand names to be exempt from plain language requirements.

lseeman commented 7 years ago

@CharlesBelov yes, that makes sence

lseeman commented 7 years ago

@cstrobbe Thank you for your well reseached comments. - and for finding the word frequency scripts. i had it on my to do list to write one, but now I do not have to. I am tempted to only recommend a corpus but not require it. That way we will not run into difficulties with dialects and sites that are for a specialist audience.

lseeman commented 7 years ago

new proposed language that adresses most of the comments

Error messages that require a response to continue, instructions, labels and navigational elements use language , so that all of the following are true:

lseeman commented 7 years ago

@jim-work @jmcsorle I made a small change ti the first sentence. it is a bit clearer (I think)

For error messages that require a response to continue, instructions, labels and navigational elements all of the following are true:

CityMouse commented 7 years ago

I am wary of setting a standard to "double negative" due to either negative polarity ( of Latin and German-bsed) languages verses the negative concord of languages uch Portugese, Russian Spanish. Therefore internationalization would become an issue with this standard seemingly.

https://en.wikipedia.org/wiki/Double_negative

Maybe this could be addressed?

lseeman commented 7 years ago

@CityMouse the Negative concord is allowed with the new wording. You only can not make a positive statements using double negatives. That is the change to support internationalization .

lseeman commented 7 years ago

a new sources http://www.minspeak.com/CoreVocabulary.php#.WQ8EzuUrI2w

mbgower commented 7 years ago

On today's call (in the extended time), I proposed a departure from the current approach to Plain Language, which I was asked to draft. Here it is:

Proposed SC

Clear Instructions: Instructions describe the topic or purpose.

Background

This is a direct use of existing SC language in 2.4.6 to plug a hole between 2.4.6 and 3.3.2 that results in labels needing to be present and descriptive, but instructions just needing to be present.

Headings and labels describe topic or purpose. Labels or instructions are provided when content requires user input.

There can be a lot better language than the short description I've supplied. (e.g., "Instructions describe the desired user action or behaviour.)" I simply lifted the existing 2.4.6 language since it already passed at 2.0, and therefore should be serve as a sample of a simple and relatively vague goal which nonetheless has existed in WCAG SC langauge for the past decade.

What it addresses

By focusing on the lack of a requirement to make instructions descriptive (or clear), this SC immediately opens up the possibility of introducing a bunch of techniques that can be used to address COGA TF objectives. In example, the techniques for 2.4.6 state

The objective of this technique is to ensure that the label for any interactive component within Web content makes the component's purpose clear.

The techniques are very high level and undeveloped for 2.4.6. So, all the points that were trying to be given the weight of an SC requirement could become techniques that can be employed for both 2.4.6 and this new Clear Instructions SC:

It could also draw on neglected parts of the following proposed SCs to add additional techniques:

And the following are somewhat related, and could again offer possible techniques: