w3c / i18n-glossary

Definitions of terms used in W3C Internationalization documents.
https://w3c.github.io/i18n-glossary/
4 stars 4 forks source link

Review glossary entries for conflicts with or consistency with infra #49

Open r12a opened 1 year ago

r12a commented 1 year ago

This lists terms that are defined in both our glossary and that of INFRA. (For action 1251)

ASCII case-insensitive matching link to infra but no embedded definition - no clash, but ours is more explanatory

i18n

ASCII case-insensitive matching. Defined in INFRA, this compares two sequences of code points as if all ASCII code points in the range 0x41 to 0x5A (A to Z) were mapped to the corresponding code points in the range 0x61 to 0x7A (a to z), but other code points are not case-folded. ASCII case-insensitive matching can be required when a vocabulary is itself constrained to ASCII.

INFRA

A string A is an ASCII case-insensitive match for a string B, if the ASCII lowercase of A is the ASCII lowercase of B.

Code point no link

i18n:

Code point. A code point value represents the position of a character in a coded character set. For example, the code point for the letter á in the Unicode coded character set is 225 in decimal, or 0xE1 in hexadecimal notation. Hexadecimal notation is commonly used for referring to code points. See also Unicode code point.

INFRA:

A code point is a Unicode code point and is represented as "U+" followed by four-to-six ASCII upper hex digits, in the range U+0000 to U+10FFFF, inclusive. A code point’s value is its underlying number.

A code point may be followed by its name, by its rendered form between parentheses when it is not U+0028 or U+0029, or by both. Documents using the Infra Standard are encouraged to follow code points by their name when they cannot be rendered or are U+0028 or U+0029; otherwise, follow them by their rendered form between parentheses, for legibility.

A code point’s name is defined in Unicode and represented in ASCII uppercase. [UNICODE]

Code unit. no link i18n:

Code unit. The units of data used by a character encoding to encode or serialize characters into a programming language or other serialized form (such as a file). Common code units are 8-, 16-, and 32-bits in size. On the Web we are mostly concerned with bytes, which are technically "8-bit code units". However, in Javascript a char is a 16-bit code unit (related to the UTF-16 encoding of Unicode)

INFRA:

A string is a sequence of unsigned 16-bit integers, also known as code units. A string is also known as a JavaScript string. Strings are denoted by double quotes and monospace font.

Scalar value no link i18n:

Unicode scalar value. Unicode definition: "Any Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive. (See definition D76 in Section 3.9, Unicode Encoding Forms.)"

INFRA:

A scalar value is a code point that is not a surrogate.

Surrogate code point link i18n:

Surrogate code point. Unicode definition: "A Unicode code point in the range U+D800..U+DFFF. Reserved for use by UTF-16, where a pair of surrogate code units (a high surrogate followed by a low surrogate) “stand in” for a supplementary code point." This term is also defined by [INFRA].

INFRA:

A leading surrogate is a code point that is in the range U+D800 to U+DBFF, inclusive.

A trailing surrogate is a code point that is in the range U+DC00 to U+DFFF, inclusive.

A surrogate is a leading surrogate or a trailing surrogate.