Specifying character set

aphillips commented 4 years ago

(From your self-review in #133)

We support UTF-8 but we anticipates supporting UTF-16 for symbol sets and there is nothing in our spec that would prevent this.

This statement is unclear. UTF-8 and UTF-16 are character encoding forms of the Unicode character set. That is, they are different ways of turning characters into bytes in the memory of a computer. Can you clarify what you mean by UTF-16 for symbol sets here? Do you mean "Unicode code points" or "private use characters" perhaps?

@becka11y suggested in reply:

We are not recommending one symbol set or another, we are just providing the translation. We allow translation to other character sets that may be represented by UTF-8, UTF-16, private use characters or even actual images.

My comment here is probably difficult to follow because it hinges on details of Unicode jargon. UTF-8 and UTF-16 turn out to be the names of character encodings, as opposed to what you appear to mean (which is properly termed a "character set"). In this case what you probably mean is that your character set is Unicode--which is fundamentally a good thing. You can probably close this issue as a no-op: I mainly opened it in case your TF wants to discuss how to refer to character encodings.

One place to look for a basic explanation of some of the above jargon is here:

https://www.w3.org/International/articles/definitions-characters/

lseeman commented 4 years ago

Thank you @aphillips. We will be using the reference and will reopen this is if we find we need more discussion.

snidersd commented 3 years ago

Closing issue since we have not been advised of any objections.

w3c / adapt

Specifying character set #140