mojaloop / design-authority-project

This is the Issue and Decision Log for tracking mojaloop and related Designs
1 stars 2 forks source link

Changing the basis of UUID generation for Mojaloop #103

Open MichaelJBRichards opened 1 year ago

MichaelJBRichards commented 1 year ago

Request Summary:

Can we move to a standard form of UID generation which is at least as secure as UUIDV4 (and preferably more so) but whose result can be encoded in 35 characters or less?

Request Details:

The current implementation of the ISO 20022 data model limits the length of a unique identifier to 35 characters. This is (just) not sufficient to hold a UUIDV4, which is the current Mojaloop standard for a GUID. We therefore need to change something. The following alternative proposals have been discussed in the ISO 20022 SIG:

  1. Move Mojaloop away from using a GUID as the standard for identification.
  2. Continue to use a UUIDV4, but omit the hyphens when it is used in the ISO 20022 identifier field. This reduces the identifier's length to 32 characters.
  3. Ask ISO 20022 to revise its identifier policy to make all identifiers of sufficient length to hold a UUIDV4.
  4. Move to a different UUID generation method which can be encoded in fewer than 36 characters

A summary of discussion on these points:

The ISO 20022 SIG has proposed moving to the CUID2 algorithm for the generation of UUIDs. This algorithm appears to offer:

Most of the work required to implement this change will be on the APIs, and I shall be raising an issue on the FSPIOP API to consider this; but I wanted the DA to consider it from a technical perspective first.

Artifacts:

Dependencies:

Accountability:

Decision(s):

- **Approved By:** ### Details - [ ] Actual decision made as a result of discussion ## **Follow-up**: - [ ] Actions to implement the decisions
millerabel commented 1 year ago

I like the concept of moving to Cuid2 in place of UUID 4 or other alternatives. That we can specify the size of the generated ID with calculable bounds on the probability of collision allows specifying sufficiently large IDs that will by definition fit within the available field width. And Cuid2 IDs are constrained to lower case ASCII and digits 0-9 with no special characters, thus simplifying encoding for exchange between systems. The innovative use of multiple sources of entropy and the obscuration of the source values limits leakage of source data.

We might move more than just our ISO identifiers to Cuid2. Identifiers used in the system like Transaction ID and Transfer ID might be stronger even in the non-ISO aspects of processing (e.g. in DFSP IOP API).

When thinking about wider application, the novel observation that the Cuid2 algorithm is fast enough, but not too fast, requires some study before applying Cuid2 as a replacement for each ID type: do we generate and use a particular ID type in a way that is consistent with this speed characteristic? Are we generating IDs in places like log entry creation many hundreds or thousands of times per second? If so, use something faster in those cases.

Cuid2 is appropriate for distributed unique ID creation, but should likely not be used where a simple single-host unique ID is needed. Be aware of the real entropy sources in the executing environment: Be wary of dozens of containers all running the same base operating system image, with essentially the same startup time, many on the same physical host with the same entropy sources. Generating thousands of IDs / second simultaneously in these containers might lead to increased risk of collision.

And ensure we don’t make assumptions that the IDs are K-sortable (Cuid2 IDs are not K-sortable, they are opaque) or database generated (too slow to generate Cuid2 using C-callable extensions from the DB layer).

With the proper study of context, I like the idea of moving to Cuid2 across the platform and APIs. Tested algorithms are available for JavaScript, Python, as well as a few languages we don’t use. It’s not yet ported to Zig.

millerabel commented 1 year ago

(Modified original note to refer to CUID2 as an algorithm, not a standard. It hasn’t been adopted by a standards body.)

MichaelJBRichards commented 1 year ago

JB: Probability of collisions needs to be sufficiently low to stop participants needing to add extra checks. Per @millerabel 's comment: We don't need to change everything - internal processes can still use UUID Analysis of data layer code: put this on core team backlog? Regex will need changing at API level. Are there any code instances where the UID class is used?

bushjames commented 1 month ago

Related to this ticket in CCB space: https://github.com/mojaloop/mojaloop-specification/issues/120#issuecomment-2149412549

bushjames commented 1 month ago

FSPIOP SIG have approved a move to support ULID. This decision needs documenting and accepting by CCB before DA can close this issue.

elnyry-sam-k commented 3 weeks ago

Here's the confirmation from the FSPIOP SIG, @bushjames : https://github.com/mojaloop/mojaloop-specification/issues/120#issuecomment-2346411600

Thanks to Henrik and the FSPIOP SIG