Change Request: Changing the format of a correlation ID in Mojaloop

MichaelJBRichards commented 1 year ago

Open API for FSP Interoperability - Change Request

1. Preface

This section contains basic information regarding the change request.

1.1 Change Request Information

1.2 Document Version Information

Version	Date	Author	Change Description
1.0	2023-05-18	Michael Richards	Initial version. Sent out for review.

2. Problem Description

2.1 Background

The current implementation of the ISO 20022 data model limits the length of a unique identifier to 35 characters. This is (just) not sufficient to hold a UUIDV4, which is the current Mojaloop standard for a Correlation ID. We therefore need to change something. The following alternative proposals have been discussed in the ISO 20022 SIG:

Move Mojaloop away from using a GUID as the standard for correlation IDs.
Continue to use a UUIDV4, but omit the hyphens when it is used in the ISO 20022 identifier field. This reduces the identifier's length to 32 characters.
Ask ISO 20022 to revise its identifier policy to make all identifiers of sufficient length to hold a UUIDV4.
Move to a different UUID generation method which can be encoded in fewer than 36 characters

A summary of discussion on these points:

We do not regard the existing ISO convention as satisfactory because we require Mojaloop identifiers to be reliably unique over long periods of time, and to adhere to a single external standard used by all participants.
Proposal 2 would work, but would require additional work by participants and the switch, and is not easy for participants outside Mojaloop to understand.
Proposal 3 is formally attractive, but stands very little chance of acceptance by the ISO community due to the amount of work that existing ISO 20022 institutions would need to undertake to implement it.
So we considered proposal 4...

2.2 Current Behaviour

Explain how the API currently behaves.

All correlation IDs are specified as UUID identifiers, and are defined as instances of the BinaryString32 data element. This is a fixed-length string which can contain alphanumeric characters and hyphens

2.3 Requested Behaviour

Explain how you would like the API to behave.

The ISO 20022 SIG has proposed moving to the CUID2 standard for the generation of UUIDs. This standard appears to offer:

Better collision resistance than UUIDV4
A result encoded in 32 characters

Most of the work required to implement this change will be on the APIs, and I shall be raising an issue on the FSPIOP API to consider this; but I wanted the DA to consider it from a technical perspective first.

3. Proposed Solution Options

Change the data type of the CorrelationID element (Section 7.3.8 of the FSPIOP specification) to be a new data type. This should be a restriction of the existing BinaryString data type (Section 7.2.17 of the FSPIOP specification) which has a fixed length of 32 characters and permits only lower-case alphanumeric characters.

PaulGregoryBaker commented 1 year ago

Initial thoughts. CUID2 is new meets all our requirements, and it's adoption since January 2023 is growing quickly. A contender that I think we should consider is NanoID. NanoID

This generator is faster than CUID2
The collision rate is much higher than CUID2 with the same character length; however both are good enough to meet our requirements.
Has good adoption

PaulGregoryBaker commented 1 year ago

Because of the difference in character length, it would be possible to support both the new chosen CorrelationID and the existing one. Should FSPIOP support both CorrelationIDs?

MichaelJBRichards commented 1 year ago

FSPIOP could support both formats; but ISO 20022, as remarked, can't, due to the field length of ISO identifiers. If we want to to retain compatibility between the two standards, we would need to drop UUIDV4.

As to CUID vs. NanoID, I am happy to leave that discussion to technical experts.

henrka commented 1 year ago

Thanks @MichaelJBRichards, I'm fine on a high-level to move to CUID2 instead of UUID in version 2.0 of the FSPIOP API. It seems like you can customize the length of your CUID2, so we can choose a length that suits others such as ISO as well.

henrka commented 1 year ago

As discussed in today's SIG meeting, the proposal is to move to CUID2 in version 2.0 of the API, to have a more modern industry standard for unique IDs that can fit in the ISO 20022 data model.

karimjindani commented 7 months ago

In the background of this issue, it says. The current implementation of the ISO 20022 data model limits the length of a unique identifier to 35 characters. UETR (Unique End to End Transaction Reference) is 36 characters and supports UUID v4 -

MichaelJBRichards commented 7 months ago

That is true of the UETR, but not of any of the other identifiers used in the ISO 20022 messages. However, we have an agreed alternative for this.

karimjindani commented 7 months ago

In the last DA session I was provided reference. To the ticket and I saw the first sentence which said the statement that ISO20022 data model limitation, which isn't true. We can and should use UETR as the end to end unique tracking reference as this is suggested in the latest (2023) paper by BIS on harmonised requirement for cross border payments. It supports holding UUID 4 format and as such we don't need to shift to any other format.

karimjindani commented 7 months ago

d218.pdf

MichaelJBRichards commented 7 months ago

All of this is, of course, completely true as regards the UETR, Karim. The problem is that the UETR is the only identifier which is defined in this way, and there are several API endpoints where we need more than one unique identifier. For instance: POST /quotes (quoteId, transactionId, transactionRequestId). So there is no problem with the UETR: the problem we need to solve is with the other identifiers which we need to include.

henrka commented 5 months ago

https://github.com/ulid/spec is an alternative suggestion for performance reasons while still also keeping the character limit for ISO 20022.

UUIDv7 has the drawback that it is still 37 characters, i.e. not possible to use in ISO 20022.

henrka commented 5 months ago

Note that there have been related discussions in #131, including additional information on performance in https://github.com/mojaloop/mojaloop-specification/pull/131#issuecomment-2147059300.

kalinkrustev commented 5 months ago

Note that there have been related discussions in #131, including additional information on performance in #131 (comment).

Here is a summary of the discussion:

A disadvantage of CUID2 is related to performance impact in DB and other storages that need to index the id:

The readme for CUID2 contains a Note on K-Sortable/Sequential/Monotonically Increasing Ids, which recommends the use of createdAt fields. The problem is that this is not possible to do for many of the ids in Mojaloop without quite substantial effort, as in many cases the generated ids are primary keys and any non-sequential id generators lead to quite bad performance when data is accumulated. Avoiding this issue when non-sequential ids are generated will require a lot of effort in restructuring the database and probably reworking the table lookups to use a time range, as the primary key must be changed. The changes might even affect some of the logic of the flows. The issue is probably related to not just the SQL database, but also other places where we are likely to store and index the data, like log aggregators, etc.
The main claim of CUID2 vs sequential ids is the leak of timestamps, but this is just a generic claim and we should consider that many of the ids we generate are only significant for a short period of time, given the real-time nature of the functionality they are associated with. So this "leak" is not really an issue in our case. This leak is only significant when associated with entities that are not so much real-time related, like account creation, customer creation, etc. Instead of worrying for a leak, maybe better think about improving the logic and restrict any operations that relate to IDs outside of a certain timeframe.
The section also recommends the use of cloud solutions and in-memory databases, which is not the inclusivity we are working for and their use is often restricted by regulation. I think the allowed id generators should not be so restrictive, as for example the most important thing we want to restrict is the length, and even this can be probably parametrized during DB creation. Accepting only CUID2 will feel like a win for the cloud providers, not for inclusivity.

Some possible alternatives for monotonically increasing are:

RFC 9562 UUID v7 seems like a good short term solution as the required changes are not big
ULID - similar to UUID v7, but with optimized serialization, which can further improve the performance, but requires more changes

I think multiple approaches should be considered depending on the particular use cases:

Monotonic IDs are good for real-time events like audit records, transactions, quotes, logs, etc.
Non-monotonic IDs are good for cases where the entities are not real-time events, like account creation, customer creation, etc.

Finally it is best to write the software in a way that allows the used IDs to be configurable and agreed within the implementation. So implementations dealing with ISO 20022 should not enforce other implementations with complex and expensive ID requirements.

kalinkrustev commented 5 months ago

Some supporting materials:

elnyry-sam-k commented 3 months ago

hi @henrka when you get a chance, maybe in one of the next meetings, lets capture the FSPIOP SIG decision on this here so that the DA can reference it. Thank you!

henrka commented 2 months ago

hi @henrka when you get a chance, maybe in one of the next meetings, lets capture the FSPIOP SIG decision on this here so that the DA can reference it. Thank you!

Let's do a formal decision in meeting on Thursday.

henrka commented 2 months ago

FSPIOP API SIG has decided to change the correlation ID to ULID, starting from version 2.0 of the API.

mojaloop / mojaloop-specification