w3c / i18n-drafts

A place to edit articles, tutorials, and the like for the /International subtree of the W3C site. Also, captures issues and comments.
63 stars 61 forks source link

Update the "backwards deletion" Q+A article #520

Closed aphillips closed 10 months ago

aphillips commented 1 year ago

This is w3c/i18n-actions#6.

Preview it here: https://aphillips.github.io/i18n-drafts/questions/qa-backwards-deletion.en.html

netlify[bot] commented 1 year ago

Deploy Preview for i18n-drafts ready!

Name Link
Latest commit 179dfa0262c89df90add2c723298dca00d55dfdf
Latest deploy log https://app.netlify.com/sites/i18n-drafts/deploys/65a1575c89752f00097a0ebd
Deploy Preview https://deploy-preview-520--i18n-drafts.netlify.app/questions/qa-backwards-deletion.en
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

xfq commented 1 year ago

I updated the article to match the latest template.

We also need to:

r12a commented 1 year ago

Generally speaking, most text navigation and editing follows the user-perceived character boundaries. For most implementations this corresponds to Unicode's definition of "default extended grapheme cluster" boundaries [UAX29]. The main exception to this is backspacing, which usually follows Unicode code point boundaries in the underlying encoded text (although there are exceptions to this). For the simplest scripts and languages, these often amount to the same thing.

This and other parts of the document strike me as over-simplified and in places incorrect, but there are terminological problems (which we are already familiar with) that cloud the issue. My experience in working with these things has lead me to view the world in terms of code points, which are grouped into grapheme clusters, which are in turn grouped into orthographic syllables. (I'm in the process of writing that up more clearly, elsewhere...)

I'm inclined to agree with Norbert that this idea of user-perceived character boundaries is too vague and not clearly substantiated enough to be used as the name of a unit of segmentation. Rather it's merely a way of helping people imagine why code point units are not sufficient in some cases. The distinction between grapheme clusters and orthographic syllables is not informed by it's used, but is crucial in the information provided by this article.

My experience has shown that browsers use these 3 different units for text operations such as cursor movement and deletion, depending on the language, and sometimes inconsistently within a single language, but also from browser to browser. I've been investigating this and writing up results for the various browsers in my orthography notes, under the section "Graphemes". It may be worth going to https://r12a.github.io/scripts/switch.html and selecting the 'graphemes' segment id, then cycling throught the orthographies using the control "Select an orthography". You should especially look for the subheading "Browser behaviour", where it exists, to find the results per browser.

(I was wondering whether it would be useful to list behaviour against orthography in a table of some sort – not necessarily in this article, but somewhere.)

That said, it's not clear to me what is your source of authoritative information about how cursoring and deletion should work. I don't think that it is made clear in the UAX how things should work, but is rather left up to the application to decide the exact mechanism.(?) Or are you meaning to describe what browsers currently do? I think it would be good to make that much clearer.

I also think that the article should make it much clearer (actually, i think it's hardly mentioned at all other than for one Thai example) that very different segmentation rules may apply for other operations on the text, such as line breaking, justification, text spacing, and the like – and that this is not an issue, but is useful.

The exceptions section alludes to the importance of orthographic syllables, but this isn't really an exception - even in terms of current browser support. Again it varies by browsers and by orthography, but it's something that needs to be mentioned either together with or given equal importance to the section entitled "Combining characters".

xfq commented 1 year ago

I think it might be useful to add an example of IVS. For example, the characters on this page are made of two code points (U+9F8D + an ideographic variation selector), but for users, they should be input, selected, and deleted as a whole. Regarding input methods, many input methods can already input IVS. We can mention cursor movement, selection, and deletion here.

aphillips commented 10 months ago

The working group elected not to complete work on this QA document. However, I don't want to lose the invested effort. I'm merging the changes for now. I should probably add a visible deprecation too.