Open bert-github opened 1 month ago
Hi @bert-github
Sorry, didn't mean to close before.
The Format does require that the identifiers be unique but we never wanted to prescribe how "unique" must be measured. Different rule writers may have different mechanisms to ensure their identifiers are unique.
For example, the CG always picks up identifiers that are lowercase ASCII characters, to prevent the situations you describe above.
(This is part of the review by the Internationalization WG. Sorry for being late – it's entirely my fault.)
4.1. Rule Identifier https://www.w3.org/TR/2024/WD-act-rules-format-1.1-20240618/#rule-identifier
To know if an identifier is unique (and to be able to use it in one rule to point to another), you need to know when two identifiers are the same. E.g., are capital letters (ABC) the same as lowercase letters (abc)? If a letter can be encoded in Unicode in two ways (e.g., ‘é’ as single character vs separate ‘e’ + acute accent) are those the same?
‘Character Model for the World Wide Web: String Matching’ explains the issues with comparing two strings of text and has recommendations for choosing an algorithm, including for text strings used as identifiers.