Letter/Word spacing for CJK languages - SC 1.4.12 Text Spacing

MakotoUeki commented 6 years ago

In Japanese language, we don't put white spaces between words. It might be the same in Chinese and Korean language. So it might be needed to add exception to web content in CJK languages at least.

And we need to investigate the letter spacing (tracking) works for the CJK characters as well.

lauracarlson commented 6 years ago

Hi @MakotoUeki ,

Thanks for the info!

I wonder if this issue a duplicate of #657 ? If so, can it be closed and resolved over there? @awkawk asked you a question on that issue.

Thanks again, Laura

MakotoUeki commented 6 years ago

It might be so. But most of web pages on Japanese websites are not using the vertical writing mode.

For exampke: https://www.kantei.go.jp/ http://www.metro.tokyo.jp/ https://www.yahoo.co.jp/ http://www.adobe.com/jp/ https://waic.jp/

We use the horizontal writing mode for Japanese web pages in general. So I can say that letter/word spacing are the same issues as #657.

lauracarlson commented 6 years ago

Hi @ MakotoUeki ,

I drafted a proposed response in the Wiki for WG consideration.

MakotoUeki commented 6 years ago

Hi @lauracarlson ,

Thanks so much for addressing this.

Is "Exception: Text in Chinese, Japanese, and Korean languages." the exception for entire this SC?

I can say that the exception is needed for word spacing. This is applicable for both horizontal and vertical text in Japanese.

As to letter spacing, it might be okay. And both line height (line spacing) and spacing underneath paragraphs will be okay as well. We need to find the research-based basis for Japanese language, especially for vertical text which I'm not sure that such kind of researches have been done though.

lauracarlson commented 6 years ago

Hi Makoto,

Yes. As worded it would be for the entire SC. If you would like to word it differently, please add a proposal to to the Wiki page. It would be most welcome. I really appreciate your expertise.

Thank you, Laura

On Jan 11, 2018 3:58 AM, "Makoto Ueki" notifications@github.com wrote:

Hi @lauracarlson https://github.com/lauracarlson ,

Thanks so much for addressing this.

Is "Exception: Text in Chinese, Japanese, and Korean languages." the exception for entire this SC?

I can say that the exception is needed for word spacing. This is applicable for both horizontal and vertical text in Japanese.

As to letter spacing, it might be okay. And both line height (line spacing) and spacing underneath paragraphs will be okay as well. We need to find the research-based basis for Japanese language, especially for vertical text which I'm not sure that such kind of researches have been done though.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/w3c/wcag21/issues/677#issuecomment-356884794, or mute the thread https://github.com/notifications/unsubscribe-auth/AF7TkAN0kmaT-I7FNqW3_k9AEbgAKRrzks5tJdtigaJpZM4RZwWI .

steverep commented 6 years ago

Have we verified that the bookmarklet would make Makoto's example sites unreadable? That is, do they add spacing inappropriately between words? I'm wondering if correctly specifying the language in the markup negates style properties like word-spacing (and it probably should if Japanese is simply not applicable).

lauracarlson commented 6 years ago

Hi Steve,

No we haven't. I likely won't be able to test anything until sometime next week. If you or someone else would like to test Makoto's examples with the bookmarklet, that would be teriffic.

Thank you, Laura

On Jan 11, 2018 3:04 PM, "Steve Repsher" notifications@github.com wrote:

Have we verified that the bookmarklet would make Makoto's example sites unreadable? That is, do they add spacing inappropriately between words? I'm wondering if correctly specifying the language in the markup negates style properties like word-spacing (and it probably should if Japanese is simply not applicable).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag21/issues/677#issuecomment-357060802, or mute the thread https://github.com/notifications/unsubscribe-auth/AF7TkOAey7yX2ivrd51mLmUEgJC096jfks5tJndJgaJpZM4RZwWI .

MakotoUeki commented 6 years ago

@lauracarlson

At least, I won't be able to be responsible for Chinese ,Korean and any other languages than Japanese.

At this moment, what I can suggest is the following only: "Word spacing to at least 0.16 times the font size, except for text in Japanese."

One more thing. Is Each value based on the researches for English? Then we need to say something in NOTE like: "The values are taken from the researches for roman texts. For other text such as CJK and Arabic text, the "equivalent" values would be taken from the same kind of researches for each language/text."

The appropriate values might be different among different language/text. The working group will not be able to specify all of the values for all of the languages.

There is a same kind of description in the "large-scale" desfinition in WCAG 2.0: https://www.w3.org/TR/WCAG21/#dfn-large-scale

johnfoliot commented 6 years ago

How about "Word spacing to at least 0.16 times the font size, except for languages or character-sets that do not support this requirement (e.g. Japanese texts)." (??)

JF

On Sun, Jan 14, 2018 at 2:26 AM, Makoto Ueki notifications@github.com wrote:

@lauracarlson https://github.com/lauracarlson

At least, I won't be able to be responsible for Chinese ,Korean and any other languages than Japanese.

At this moment, what I can suggest is the following only: "Word spacing to at least 0.16 times the font size, except for text in Japanese."

One more thing. Is Each value based on the researches for English? Then we need to say something in NOTE like: "The values are taken from the researches for roman texts. For other text such as CJK and Arabic text, the "equivalent" values would be taken from the same kind of researches for each language/text."

The appropriate values might be different among different language/text. The working group will not be able to specify all of the values for all of the languages.

There is a same kind of description in the "large-scale" desfinition in WCAG 2.0: https://www.w3.org/TR/WCAG21/#dfn-large-scale

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag21/issues/677#issuecomment-357496156, or mute the thread https://github.com/notifications/unsubscribe-auth/ABK-c7X3aLFuWf1GlTOOI1lLCn7F6Bymks5tKbo0gaJpZM4RZwWI .

-- John Foliot Principal Accessibility Strategist Deque Systems Inc. john.foliot@deque.com

Advancing the mission of digital accessibility and inclusion

awkawk commented 6 years ago

What I'm hearing from Makoto is not that it isn't supported (you can apply word-spacing) but that it isn't used, and therefore authors shouldn't be constrained to ensure that the layout doesn't break if it is.

awkawk commented 6 years ago

@r12a - any thoughts/advice on this?

awkawk commented 6 years ago

http://www.koreanwikiproject.com/wiki/Word_spacing - apparently not an issue in Korean http://research.chtsai.org/dissertation/chapter-1.html - apparently is an issue in Chinese

awkawk commented 6 years ago

How about adding: Exception: Languages which do not typically make use of one or more of these text style properties in written text can conform using only the properties that are typically used.

That way we are covered for other languages that the WG isn't well-versed in.

MakotoUeki commented 6 years ago

The addition works for "word spacing" in Japanese.

One more thing. I'd like to confirm if each value is common among any kind of languages/text including CJK, Arabic, etc. These properties can be used in Japanese.

Line height (line spacing) to at least 1.5 times the font size;

Spacing underneath paragraphs to at least 2 times the font size;

Letter spacing (tracking) to at least 0.12 times the font size;

mraccess77 commented 6 years ago

Personal comment: Has anyone looked at this study? https://www.sciencedirect.com/science/article/pii/S0042698907002556

MakotoUeki commented 6 years ago

Hi @mraccess77 , thank you so much for sharing. I'll read it.

r12a commented 6 years ago

Normally, i'd point you to our typography index, and the section at http://w3c.github.io/typography/#graphemes, but we don't seem to point to anything relevant there yet.

You may therefore find it useful in general to look at these pages by myself, which are still in development.

Perhaps the most useful starting point is https://r12a.github.io/scripts/featurelist/ which has a column entitled "Word separator" (you can click on the column heading to group the values together).

This only covers a selection of scripts, but you can see that the following don't separate words with spaces: Balinese, Han+Kana (basically Chinese & Japanese), Javanese, Khmer, Lao, Myanmar (Burmese), Thai, and Tibetan.

Note that Korean (hangul) is not one of those, because it does separate 'words' with spaces.

Watch out, though, because Khmer, Thai, and other SE Asian scripts do use spaces, but as phrase delimiters rather than word delimiters. This also applies to Tibetan, however (1) syllables are delimited rather than words (for which Tibetan uses a syllable separator (tsek)) and (2) Tibetans tends to prefer   rather than normal space. These scripts tend to stretch spaces between characters, rather than between words, for things like justification. But there may be complications. For example, Japanese tends to stretch gaps of ink around things like punctuation before stretching inter-character spacing when justifying.

You may hear that SE Asian scripts use ZWSP (zero width space) between words. You may even find some text that does so, but the vast majority of the time they don't, and applications rely on dictionary lookup and parsing to detect word boundaries (which are important for things like line breaking, in a way that they aren't for Japanese and Chinese). And before you ask, no, i don't think that we should recommend use of ZWSP for accessibility ;-)

If you look a bit further down the list, you will see that Ethiopic does distinguish word boundaries, but with a special word-separator character, rather than with a space. There is some flexibility in the width of that character when justifying, but i don't know how that translates into an accessibility guideline to widen the spaces between words.

By the way, if you want more information about these behaviours, follow the links next to the script names (if there is one). When you reach the page linked to, look for a section called something like Text layout > Text delimiters.

Essentially, (and i think i may have already mentioned it) these guidelines probably ought to clarify the writing system and language that the metrics proposed are relevant for.

hope that helps.

r12a commented 6 years ago

Oh, and wrt Arabic script, you'll read that it is common to stretch the baseline between characters when justifying text (characters typically join at the baseline). Whether this provides an opportunity for more readable text, accessiblity-wise, i don't know (but i doubt it). One has to also bear in mind that certain font styles (such as ruq'a) don't allow baseline elongation.

Urdu is also an interesting case, since the nastaliq font style it uses naturally reduces the gaps between words (in part because the nastaliq font style has a sloping baseline that is word-based, and there are word final letters that help identify word endings). I have no idea whether or not the requirement for extra space between words would translate to something useful for Urdu.

alastc commented 6 years ago

Wow, great resource there, bookmarked.

I'm concluding from the comments that:

Some or all of the scripted lanugages will need to be exceptions to the guideline, as changing the spacing could change the meaning.
We need a better term than CJK. Is 'Languages written as script' a reasonable term? Or are we better off using something like: Non-latin based languages.

r12a commented 6 years ago

Well, i think the WAI guidelines need to say something along the lines of:

Increased inter-word spacing may be helpful, and studies for English have shown that ..., however this may not be appropriate for text using other writing systems, or even for Latin script text in other languages.

I find myself wondering whether the specific recommendations will even be useful for some well-known Latin script languages, such as German, Finnish and Dutch, since these languages tend to have long compound words, (such as Eingabeverarbeitungsfunktionen) which you wouldn't really want to split - and which would be difficult to split using CSS anyway, since there is no internal delimiter. Also, in the case of German, there are capital letters for all nouns, which may also help users get by better, given the conclusions about how kanji helps japanese readers in the article linked to by @mraccess77 above.

Basically, i think you can't extrapolate some research findings for English text to any other language. You can only suggest that inter-word space stretching may be useful, and cite evidence and recommendations for that on the basis of those languages that we know have been researched.

awkawk commented 6 years ago

Sounds like we need to pull the word spacing item.

Do the others work? And do we have a rational basis for the values being appropriate for languages across the board?

Line height (line spacing) to at least 1.5 times the font size;
Spacing underneath paragraphs to at least 2 times the font size;
Letter spacing (tracking) to at least 0.12 times the font size;

alastc commented 6 years ago

Hi Richard,

I think it helps that the aim of the SC is to allow for more spacing, it is not intended to say that those values are what the user must have. These values are there to provide a baseline for testing (to say you passed or failed), but within that roughly 10% buffer on text, you could choose all letter-spacing, all word-spacing, or a larger font family.

This SC should also help internationalisation to some extent, as it means (as a designer/developer) you need to be conscious of allowing a buffer around text. Surely that is helpful for languages like German, there is no mechanism to split longer words (if that is even is desirable), but spacing them out a little more should help from a physical reading point of view.

If we pull word-spacing out, can we increase the value of letter-spacing to compensate?

mraccess77 commented 6 years ago

It is my personal understanding that the intention of the SC is to allow for more space for different font families or spacing if needed. Surely other languages allow multiple fonts -- even though we could not get the font family language in the SC for other reasons allowing for more room support more personalization of writing in many language..

mraccess77 commented 6 years ago

Line height of 1.5 is already in SC 1.4.8 for WCAG 2.0 at AAA.

mraccess77 commented 6 years ago

My personal feeling is that if we limit this to English the SC will be abandoned and all the people who can benefit from changing font family or adjust some spacing will not have what is needed. I'd be very surprised that these changes would not benefit users with low vision or cognitive disabilities. But unfortunately I don't have research at my fingertips for each language to communicate this. Studies with the general population often don't extend well to people with low vision and cognitive disabilities so we can't rely on general population studies.

joshueoconnor commented 6 years ago

Thanks all, and especially @r12a for the resources and extra input. I'm also wondering about @awkawk question if the following do help move the SC to a place to cover as far as possible scripts/languages that are 'non-latin' or with diactrics etc.

Line height (line spacing) to at least 1.5 times the font size; Spacing underneath paragraphs to at least 2 times the font size; Letter spacing (tracking) to at least 0.12 times the font size;

I'm not expert in this, and will defer to those that are but we do need to at least try to increment the SC in way that can accomodate as much variance in natural language styles as possible.

Regarding @alastc comment about not referring to CJK etc - OTTOMH - could we say 'Latin, Cyrillic, Devanagari, Semitic and diacritic type alphabets'. @r12a would know better the details of the classifications than me :-)

alastc commented 6 years ago

Hi Josh,

The goal is to acheive a certain level of spacing, so taking out word spacing reduces that amount.

We can compensate by upping the letter spacing to 0.14 (going back to my previous testing and experimenting).

However, it does feel very late in the day to be fiddling with that type of thing. Wayne did some sterling analysis on size of typefaces and the relationship to letter/word spacing, I really don't want to repeat that process without him.

I think the safe term would be scoping it to Latin-based languages, I don't know what Devanagari is or how words in Semitic are put together. Languages (human ones anyway) have always been a weakness for me, so long as we don't drop the SC I'm all ears about the right terminology.

joshueoconnor commented 6 years ago

+1 @alastc to restricting it to Latin based languages until we sort out what is needed for the rest. I also agree with you against last minute tinkering at this stage.

lauracarlson commented 6 years ago

Hi @MakotoUeki, @steverep, @alastc , and all,

Makoto wrote:

It might be so. But most of web pages on Japanese websites are not using the vertical writing mode.

For example: https://www.kantei.go.jp/ http://www.metro.tokyo.jp/ https://www.yahoo.co.jp/ http://www.adobe.com/jp/ https://waic.jp/

I used Alastair's spacing bookmarklet tool on your examples. It automatically sets spacing to what is specified in the SC. That is:

line height (line spacing) to at least 1.5 times the font size
spacing underneath paragraphs to at least 2 times the font size
letter spacing (tracking) to at least 0.12 times the font size
word spacing to at least 0.16 times the font size

The bookmarklet didn't seem to work on metro.tokyo or yahoo.co. The bookmarklet script refused to load. This should not be a problem if/when the script is put into a proper extension.

However, the following are screenshots of the other 3 :

Makoto, do you detect any loss of content?

MakotoUeki commented 6 years ago

@lauracarlson There is no problem. I'm still not sure if each value is valid enough for Japanese text. But I can say those values would be reasonable as the minimum requirements. Increasing spacing benefits Japanese users with low vision or dyslexia. One exception for Japanese is word spacing.

lauracarlson commented 6 years ago

@MakotoUeki,

Thank you for checking them.

r12a commented 6 years ago

If we pull word-spacing out, can we increase the value of letter-spacing to compensate?

See https://github.com/w3c/wcag21/issues/390. I assume that one expects increases to be applied sensibly and as a minimum, so pre-existing text with extra letter-spacing (eg. Hebrew emphasis) will be taken into account by the person adding the letter-spacing increase. There's still a question around whether the SC is relevant to cursive scripts, such as Arabic, N'Ko, Mongolian, etc. since it's not clear what tracking (ie. letter-spacing) means for those scripts, if anything, given that letters are joined. I am also assuming that we are assuming that tracking does the right thing for complex scripts that add space around syllables rather than individual letters, eg. Devanagari, Bengali, Thai, Khmer, etc.

Regarding @alastc comment about not referring to CJK etc - OTTOMH - could we say 'Latin, Cyrillic, Devanagari, Semitic and diacritic type alphabets'. @r12a would know better the details of the classifications than me :-)

Why not say something like "For languages that separate words using spaces...." Would that work?

awkawk commented 6 years ago

@r12a - we are looking at the following for an exception. Would appreciate your thoughts:

Exception: Human languages and scripts which do not make use of one or more of these text style properties in written text can conform using only the properties that are used.

r12a commented 6 years ago

I think that makes sense.

alastc commented 6 years ago

Hi @r12a,

The intent is that the value is a maximum increase for the purpose of testing. The user can then use any value they like, but the site is repsonsible for making it work upto the set point. If the user goes beyond the value, things might break, but that isn't their issue.

For a language like Hebrew, it seems like the spacing value provides meaning, so we don't want to impact that. (I'm curious how that works on the web though? spans around everything?) Again though, if the user increased the spacing proportionately, it is up to the site to test for an increase up to that point.

@awkawk given that some languages convey meaning with spacing, "do not typically make use of" might not cover it. Oh, and Steve hates "typically", so:

Exception: Human languages and scripts which do not make use of, or convey meaning with, one or more of these text style properties can conform using only the properties that are used and do not impact the meaning.

Or perhaps I misunderstood and that isn't needed? Banging head cold today, I could be confused.

lauracarlson commented 6 years ago

@alastc wrote:

For a language like Hebrew, it seems like the spacing value provides meaning, so we don't want to impact that. (I'm curious how that works on the web though? spans around everything?) Again though, if the user increased the spacing proportionately, it is up to the site to test for an increase up to that point.

I wonder if @lseeman may be able to advise on spacing and languages like Hebrew?

r12a commented 6 years ago

I'm curious how that works on the web though? spans around everything?

Here's a picture: https://github.com/w3c/type-samples/issues/19 (red lines aren't in the original)

I assume you would just put the relevant text in a em tag and apply CSS letter-spacing to your em tags.

alastc commented 6 years ago

Ah, if it's aplied semantically that would be ok, as the user-style sheet / plugin could allow for that, assuming it is done reasonably consistently across sites.

I don't think there would be much user benefit in that scenario, but it's not harmful (would pass easily).

I.e. no need for my tweak.

allanj-uaag commented 6 years ago

+1 to Alastair tweak

I went to http://www.geonames.de/languages.html and https://www.omniglot.com/language/names.htm and several of the phrase pages of languages with interesting scripts from https://www.omniglot.com/language/phrases/index.htm

I applied the conditions listed in the SC. nothing seemed to break for the roughly 480 languages and scripts represented.

On Wed, Jan 17, 2018 at 10:31 AM, Laura Carlson notifications@github.com wrote:

+1 to @alastc https://github.com/alastc's tweak:

Exception: Human languages and scripts which do not make use of, or convey meaning with, one or more of these text style properties in written text can conform using only the properties that are used and can be adjusted without impacting the meaning.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag21/issues/677#issuecomment-358361247, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_WL1wd0mBUrFmQuvtaBSffhujnWYfcks5tLiB-gaJpZM4RZwWI .

-- Jim Allan, Accessibility Coordinator Texas School for the Blind and Visually Impaired 1100 W. 45th St., Austin, Texas 78756 voice 512.206.9315 fax: 512.206.9452 http://www.tsbvi.edu/ "We shape our tools and thereafter our tools shape us." McLuhan, 1964

alastc commented 6 years ago

Um, could you understand whether the meaning had changed in every instance?

No matter, if those sorts of changes are applied with markup, my adjustment isn't needed.

allanj-uaag commented 6 years ago

On Wed, Jan 17, 2018 at 12:04 PM, Alastair Campbell < notifications@github.com> wrote:

Um, could you understand whether the meaning had changed in every instance?

I don't know.

I was only checking to make sure the script or diacritics etc. did not separate (with line height). Individual characters remained intact tho spaced a bit further apart. Same with the words (except for languages/scripts that don't have words. Even when applied to these languages the word-spacing had no effect). Line-height adjustment did not separate diacritics from characters, nor did it adversely impact ascenders and descenders.

It seems to me that a low vision person needs to adjust the spacings listed so they can read. One assumes the user already know the language/script they are reading. If they need more space - so be it. They will have adjusted their reading and comprehension for the new spacing.

The SC allows the user to make the adjustments. the author needs to allow for the adjustments within the SC range. The ability to read and and derive meaning from the adjusted content is up to the reader. If the increased spacing impacts those abilities the user will adjust or they just won't do it. Regardless, they user should have the flexibility that the SC provides.

Jim

No matter, if those sorts of changes are applied with markup, my adjustment

isn't needed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag21/issues/677#issuecomment-358390617, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_WL54sJC5WLF6g5YygP5nYJwfwXYYLks5tLjY2gaJpZM4RZwWI .

-- Jim Allan, Accessibility Coordinator Texas School for the Blind and Visually Impaired 1100 W. 45th St., Austin, Texas 78756 voice 512.206.9315 fax: 512.206.9452 http://www.tsbvi.edu/ "We shape our tools and thereafter our tools shape us." McLuhan, 1964

lauracarlson commented 6 years ago

@awkawk I think we are all in agreement that your exception wording is good to go.

joshueoconnor commented 6 years ago

Great stuff all.

awkawk commented 6 years ago

(Official WG Response) Thank you for your comment. These properties are defined in CSS and in typographical principals, and they adhere to the meaning used in CSS. We will modify the description for paragraph spacing to "spacing following paragraphs" instead of "spacing underneath paragraphs" to address the implicit western language bias in this item. We will also add the following exception to address cases such as in Japanese where word-spacing is not typically used.

Exception: Human languages and scripts which do not make use of one or more of these text style properties in written text can conform using only the properties that are used.

https://github.com/w3c/wcag21/pull/729

w3c / wcag21

Letter/Word spacing for CJK languages - SC 1.4.12 Text Spacing #677