mojaloop / design-authority-project

This is the Issue and Decision Log for tracking mojaloop and related Designs
1 stars 2 forks source link

Support for accented characters in data type "Name" #42

Closed NicoDuvenage closed 2 years ago

NicoDuvenage commented 4 years ago

Request:

In Section 7.2.4.1 of the API specification, the definition of the regular expression to parse a variable of the Name type states: "all Unicode32 characters are allowed". In fact, accented and non-Roman characters are rejected by the example regular expression given in Listing 14.

Artifacts:

Decision(s):

Follow-up:

Dependencies:

Accountability:

Notes:

NicoDuvenage commented 4 years ago

There has been some email correspondence regarding this topic lately, under the subject:

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names (#1261)

elnyry-sam-k commented 4 years ago

Not exactly on email, but on the GitHub issue here: https://github.com/mojaloop/project/issues/1261 (I suspect that email was just a GitHub notification)

mjbrichards commented 4 years ago

Just so we have it here too: here is my current thinking on what the specification should read, copied over from the GitHub issue:

_I think I'd like to suggest that there should be a reference somewhere in the API specification (preferably in a table of such versioning references) to the Unicode release level. Within that, we should use references to the Unicode General Categories that we allow or prohibit in a field of a specific type. So we might rewrite Miller's proposal to say:

"Letters, both accented and unaccented, being chosen from all code points belonging to the Letter and Decimal_Number general categories as defined in the reference version of the Unicode specification (with link to reference.) In addition, the period (.), apostrophe ('), dash (-), comma (,) and space character are permitted. Interior spaces are allowed, but no leading or trailing spaces. For the avoidance of doubt, Names may include leading digits."

We can then allow implementers to decide how best to meet these requirements._

My view is that, as Miller says elsewhere, it is much safer to use Unicode category names than to identify the content of those categories explicitly...

mjbrichards commented 4 years ago

Presumably the problem from a DA perspective is not the code solution, it's the tests we propose to apply to check whether a given solution meets the requirement or not...

elnyry-sam-k commented 4 years ago

Here are the two issues for tracking this:

elnyry-sam-k commented 2 years ago

All accented characters are now supported in implementation (example issue addressing this: https://github.com/mojaloop/project/issues/2358)