mojaloop / mojaloop-specification

This repo contains the specification document set of the Open API for FSP Interoperability
https://docs.mojaloop.io/api
Other
20 stars 40 forks source link

Change Request: Modifications to Admin API to support Settlement #117

Open MichaelJBRichards opened 1 year ago

MichaelJBRichards commented 1 year ago

Open API for FSP Interoperability - Change Request

Table of Contents

1. Preface


This section contains basic information regarding the change request.

1.1 Change Request Information

| Requested By | Michael Richards, INFITX | | Change Request Status | In review ☒ / Approved ☐ / Rejected ☐ | | Approved/Rejected Date | |

1.2 Document Version Information

Version Date Author Change Description
1.0 2023-02-07 Michael Richards Initial version of request. Sent out for review.

2. Problem Description


2.1 Background

We are currently engaged in making changes to the way in which transfers are assigned for settlement in Mojaloop. A consequence of this is that transfers will be assigned deterministically to a settlement batch (we no longer refer to settlement windows) based on the characteristics of the transfer and the settlement model which is defined for transfers with those characteristics. Each settlement batch will be assigned a name which will reflect the characteristics of the batch. These names will include the following elements:

  1. The name of the settlement model to which the batch belongs.
  2. The time segment of the settlement model to which the batch belongs.
  3. Any other characteristics defined for the settlement model.

The first two elements will be required; the third element may or may not be present.

2.2 Current Behaviour

At present, an administrator requests initiation of a settlement by specifying the ID numbers of the settlement windows which are to be included in the settlement.

2.3 Requested Behaviour

Specification of the transfers to be included in a settlement should be made in the following way:

  1. The administrator should always specify which settlement model is to be settled. This should be made from a list of the settlement models defined in the scheme, and it must therefore be possible to request a list of the settlement models defined in the scheme.
  2. The administrator should specify what time period is to be settled, by specifying a time unit (e.g. days, hours, minutes) and a start and end point for the settlement expressed in those units. It should not be possible to specify a time period for settlement which is more granular than the time period specified for the settlement model.
  3. If further segmentation of the settlement batches is specified by the settlement mode ad described in point 3 above, then the administrator should be able to request a list of the available segments for the settlement model and time period selected, and should be able to select one or more of the segments returned.

The administrator should also be able to select a group of batches previously selected by entering the ID number associated with the set of batches when they were originally returned.

Once an administrator has selected a group of batches using the method described above, they should be able to see the overall sum of the transfers in the batches selected, segmented by participant and optionally by batch.

The administrator should also be able to see the status (open, closed, settling, settled) of the batches selected.

The set of batches selected should be assigned an ID. The administrator

If all of the batches selected are closed, then the administrator should be able to request that those batches are settled. The system should settle the batches selected and report the status of the settlement once it is complete. This report should be a synchronous response to the settlement request.

In addition, the admnistrator should be able to select a single settlement batch using the techniques described above and view the transfers which comprise it.

In addition, an administrator should be able to close a settlement batch manually. The single settlement batch should be selected either using the techniques described above, or by specifying a transfer ID and asking to close the settlement batch which contains that ID.

3. Proposed Solution Options


mdebarros commented 1 year ago

Action items from CCB Admin Meeting Held on 2023-02-08

pedrosousabarreto commented 1 year ago

I've updated the reference architecture flow as discussed, here is the link that points to the updated settlements flow: https://miro.com/app/board/uXjVPpKjqlw=/?moveToWidget=3458764545671714541&cot=14

koekiebox commented 1 year ago

@mdebarros I have added section #5 into the settlement README.md to list the current exposed API endpoints.

image

I am in the process of updating the flows.

https://github.com/mojaloop/settlements-bc/blob/main/docs/flows/README.md

pedrosousabarreto commented 1 year ago

Given that the normal flow is that settlements are triggered by the TransferCommittedFulfiledEvt message, that endpoint #01 should used for tests only, and maybe not part of the official API

MichaelJBRichards commented 1 year ago

My comments:

pedrosousabarreto commented 1 year ago

Given the comment, I'll also add my comments from the meeting of 15 Mar 2023.

Creation of a matrix is asynchronous as we cannot guarantee it will processed within HTTP timeout limits, hence #02 (POST matrix) must be asynchronous and return only a 202 status with the ID of the matrix to be created. Same pattern as the rest of our APIs that use the asynchronous pattern.

Request #03 is absolutely required, as a GET is idempotent and cacheable by definition and cannot change the state of the system, so we need a separate method to request the recalculation of the matrix. In case of doubt see: https://httpwg.org/specs/rfc9110.html#GET and https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/GET

According to the agreement of the meeting of the 8th Feb 2023, the closing of batches is done through the closing of a matrix that includes said batches, AND not the closing of the batches directly.

See note in updated ref arch miro board with the agreement of that meeting: https://miro.com/app/board/uXjVPpKjqlw=/?moveToWidget=3458764545671714541&cot=14

As discussed, a new method like 04 (POST /matrix/:id/close) will be created to settle (close&settle if target matrix is not closed). With that, the operator has the ability to either close an open matrix, close&settle an open matrix or settle a closed matrix. This can be used to create a closed but not settled matrix which hold in dispute batches.

MichaelJBRichards commented 1 year ago

Don't care whether it's synchronous or not: in my opinion, an operator is entitled to expect a response from the "create a matrix" instruction.

I quote from the URL you refer: "intended to reflect the resource's current state". It is not intended to change the state of the system, merely to reflect its current state. I fail to see the relevance of the second reference.

There may be a hair to be split relating to the difference between closing a batch and closing a single-batch matrix. I don't think we want to require the creation of a matrix in order to close a batch; and, as far as I can see, you omit consideration of the "close a batch which contains..." case.

We definitely need to be able to close an individual batch in case of dispute, in my opinion.

Sorry, I don't seem to be able to sign into the Miro board you link to. Please be more explicit.

I'm afraid I don't understand the last paragraph. Could you expand?

mdebarros commented 1 year ago

Given that the normal flow is that settlements are triggered by the TransferCommittedFulfiledEvt message, that endpoint #1 should used for tests only, and maybe not part of the official API

In general, we have seperated out "test" APIs from the actual main API to the point that the "test" APIs on a service will run on a complete seperate port.

Perhaps that is not necessary with RBAC, but I would still prefer to see that clear seperation.

pedrosousabarreto commented 1 year ago

@MichaelJBRichards The operator will get a response; that the request to create a matrix was accepted. It is up to the UI to redirect the operator to the page where the new matrix can be seen (with a get). Looks like we are agreeing that a GET should not change the state of a matrix, but simply return it as it is persisted, without triggering any recalculation.

The last sentence was about your ask to have matrices (and included batches) that can be closed but not settled, I'll try to clarify better in an additional comment.

Agree @mdebarros, that post transfer request should be removed, it was there for the initial tests, and is no longer needed.

MichaelJBRichards commented 1 year ago

I fear, @pedrosousabarreto , that we are still not quite at the point of agreement. Let me take an example: if I were to issue a GET /transfers, would I expect to see the state of the transfer as it currently is, or the state of the transfer as it was when I issued the POST /transfers? I'm guessing, the latter; if you don't agree, I think we have a fundamental design crisis on our hands. So you're right in the sense that a GET shouldn't change the content of a matrix; but wrong, I think, in that you're designing the matrix update to happen in the wrong place.

Now, I'm as relaxed about CQRS as the next (insert gendered identity here), but I hope that you won't mind if I say that I have a different view on how it should be implemented in this case.

If I've understood your API design correctly, you expect the C part of the CQRS pattern to come from the external API: "please update the matrix". In my view, the C comes internally: when the settlements BC updates the content of a batch, it should properly update the content of all matrices which contain that batch. I can perfectly understand that programming this is simpler if you postpone calculation of the matrix until someone asks to see it, but in my mind at least that is not well aligned with the CQRS pattern.

My reading of the HTML syntax is as follows. The initial POST to the matrix should create a resource, which contains the matrix criteria and the current state of the matrix content. This resource is then updated by the addition of new settlement batches and the updating of existing settlement batches. The GET then requests (as your reference says) "transfer of a current selected representation for the target resource": that is, the current state of the matrix. A GET on a matrix is, in my view, no more obliged to return the same results as it did to previous GETs than is a GET /transfers to return the state in which the transfer was when it was created.

This view of things implies, as you will be aware, that an initial POST will require calculation of the matrix which corresponds to the criteria entered. Fortunately, the pattern that we have proposed means that the overhead of this construction is only dependent on the number of batches, and not on the number of transfers which have been aggregated into each batch. Is this a large overhead? Well, I tried a small experiment. If we assume maximum granularity for the settlement batches, then we shall create a new batch every second. If we align with the Level One principles, then we shall not settle less regularly than daily. So the maximum number of batches in a day would be 86,400 (60x60x24). So I knocked up a little MySql table with two columns (batch name and batch total), and stuck a batch into it for every second of three days (March 1-3) - approximately 250k batches. Then I asked for the aggregate of a matrix which contained every settlement batch for the middle of the three days:

select count(batchname) as `Records`, sum(batchtotal) as `Sum`
    from settlementbatches
    where batchname like 'Default.2023.3.2%';

This executed in 63ms, which doesn't look to me like a major problem. Perhaps we should convert that into an NFR of a form like: should update and return a matrix of 100k batches within 100ms.

So I still want to see matrix content returned when the operator requests the creation of a matrix. As I said yesterday, this isn't a comment about your internal APIs: it's up to you to design them how you want. But at the externally facing API, when I create a resource using a POST I think it's reasonable to see the resource created as the response - which, for me, means content as well as form.

pedrosousabarreto commented 1 year ago

@MichaelJBRichards This discussion is going too deep in the implementation details, maybe we should go back to the requirements and API design, and leave the implementation details aside.

The requirements, as I understood them are, please comment if not correct:

Future nice to have:

A few notes on the current implementation:

Is there a way we can validate those requirements with an operator maybe?

koekiebox commented 1 year ago

Included below (DRAFT) is the sub-list of endpoints for the Admin API. @MichaelJBRichards , valid point with regards to the endpoint bodies. There are bodies defined for settlement, but I think we should follow the same path as before.

  1. @koekiebox would update the bodies for settlement to reflect the latest iteration - Will add here; https://github.com/mojaloop/settlements-bc/blob/main/docs/flows/README.md
  2. We then have a discussion on the content of the bodies for each of the below endpoints for our next Admin API discussion?
  3. We definitely need to unpack item nr 04 - close batch by transaction id
Ref # URL Method Description
01. /matrix POST Create a settlement matrix and return the matrix UUID (newly generated). Batches that match the matrix criteria will be included as part of the settlement matrix. Any newly created batches will not be automatically associated with the settlement matrix. Newly created transfers will still be allocated to batches that are not yet closed.
02. /matrix/:id/close POST Close settlement batche/s. Once a settlement batch is closed, the balances for the batches and accounts will not change. No newly created transfers will be allocated to closed batches.
03. /matrix/:id/settle POST Close settlement batche/s and settlement matrix. Generate a settlement matrix as response, as well as settling the batches. As part of the settle process, the open settlement batches will be closed. See close. An indicator will be present per batch to indicate whether the batch was closed as part of the settle action.
04. /batches/:id/close POST Transaction ID's may be provided as parameters, in order to close batches containing those transfers.
05. /batches/:id GET Retrieve settlement batch by UUID. The batches include the settlement accounts and balances.
06. /transfers GET Retrieve all settlement transfers for a given batch.
07. /matrix/:id GET Retrieve the settlement matrix by UUID. If the settlement matrix is not in a closed state, the batch and account balances for the open batches may change due to new transfers.
pedrosousabarreto commented 1 year ago

I don't think 04 should exist, any closing or settling of batches must exist in the context of a matrix.

May be we can consider this new requirement ( I wrote above):

This new matrix, can be closed and marked as "in dispute". As consequence, all included batches are closed and marked as "in dispute". Later we can settle that matrix.

This way don't break the rule that settling always happens in the context of a matrix.

koekiebox commented 1 year ago

@pedrosousabarreto Thank you. That is correct. We would hoever be able to achieve item 04 closing a batch by requesting a new matrix that only has the single batch as part of this newly created matrix. I would just question, in what scenarios would we want to be able to do that @MichaelJBRichards ?

@MichaelJBRichards based on our earlier discussion with disputes. Would it not make sense to rather dispute batches, or is the two actions not related?

Please keep in mind that the above is just a draft, for us to get a picture on how the Admin API will look and behave, lets keep challenging.

MichaelJBRichards commented 1 year ago

The problem here is that we've said that we want to define matrices by criteria, not by specifying individual batches. I don't see anything wrong with specifying batches by ID and closing them. They could still belong to the matrix they did before, except that settling that matrix while the batch is still in dispute would exclude the batch (I assume, by the way, that we have no reason not to allow a batch to belong to more than one matrix.) The matrix could then be settled again once the batch has been returned to a settleable state. I think that it's quite unlikely that we will want to close a set of batches because of a dispute, since this will generally be a consequence of a dispute about one (or perhaps a small number) of transfers. However, I wouldn't be against adding a resource to the /matrix endpoint to mark all batches in the matrix as being in dispute.

Presumably we shall also need an endpoint which enables an administrator to mark one or more batches as no longer being in dispute?

In response to your question, @koekiebox: we're expecting that a query might be raised about an individual transfer (or, as I say, a small number of transfers), and we would want to identify the batch to which it belongs and mark that batch as being in dispute, leaving subsequent transfers that would have been in that batch to go into a new sequence. That's why I've been asking consistently for an endpoint that enables an administrator to ask for the batch that contains a particular transfer (specified by transaction ID) to be marked as in dispute, which I think 4 does. So thanks for that.

On 6, presumably we would include the batch ID in the endpoint? So something like GET /transfers/:batchid?

I'm also thinking that we will need some kind of report on matrices. In particular, we'd like to be able to understand things like:

With regard to validation, @pedrosousabarreto, Paul Makin and I are the product owners for this workstream. I'd be happy to get Paul involved if you think it useful.

pedrosousabarreto commented 1 year ago

@MichaelJBRichards As product owner, can you create a high level requirements doc for settlements? One which focuses on the use cases from the operator perspective and not technical implementation details.

That would be great!

MichaelJBRichards commented 1 year ago

As requested, @pedrosousabarreto, I've created a requirements document for administrator functions relating to settlement. It's been reviewed by Paul Makin.

I want to reiterate a point which keeps cropping up. There should not be a requirement for the hub operator to take some action to update the content of a matrix. Updating a matrix is something that the system should take care of. This does not, of course, affect the idempotency of a request to see a settlement report, which should always return the current state of the report content. Leaving this responsibility with the hub operator is asking for operational trouble, in my view.

Settlement administrator stories.docx

If anyone has any immediate questions or comments, please reply; otherwise, we can talk about this at our next meeting.

pedrosousabarreto commented 1 year ago

Hi @MichaelJBRichards and Paul, thanks for this. We'll take proper look and provide detailed feedback as soon as possible.

From a first look, one issue comes out as something that was already discussed and explained a few times. We cannot fully re-calculate the matrix or the batches every time a new transfer is detected by the settlements system. As mentioned, this part of the system is on the high performance and high frequency path and would be too taxing on the system. We cannot build it like that.

The way the system works, is that:

Another problem of the automatic matrix recalculation is that more transfers can be included in a matrix between the time the operator clicks the button to settle a matrix and the actual setting - because there are automatic processes happening in the background - this would result in a matrix that is different from the one the operator wanted to settle.

Think about it this way, why recalculate a matrix thousands or millions of times a day when we can do it once or twice without affecting user experience or functionality?

Why are you so adamant about this specific case? I'm sure an operator wouldn't mind pressing a button to recalculate the matrix once or twice prior to settling it. This would make our system much easier on computational resources and cheaper to run.

MichaelJBRichards commented 1 year ago

I'm afraid I'm now confused, @pedrosousabarreto. First you tell me to leave the implementation details aside, and now you tell me that I have to modify my requirements because of the implementation details. Which path should I take?

pedrosousabarreto commented 1 year ago

The one were you don't ignore the fact that the automatic matrix recalculation on every transfer keeps us from delivering on the the non-functional requirements of scalability and performance. This was explained already several times, not sure what the confusion is.

MichaelJBRichards commented 1 year ago

The confusion is that I tried to explain why I don't accept that what you present (not explain) as a fact really is a fact and not a consequence of particular design decisions which could (and, in my opinion, should) be changed. You ignored my explanation and told me not to think about the implementation details. So I stopped. Now you tell me that I have to change my user requirements because of a "fact" for which no supporting evidence has been produced, and where alternative implementations have been proposed which do not appear to have the drawbacks of the current approach. Is that a clear enough statement of where the confusion lies?

pedrosousabarreto commented 1 year ago

I didn't ignore it Michael, would never do that.

You're proposing that we re-calculate batches and matrices on the fly when transfers are received, and not when an operator requests it to be recalculated. I'm saying this is a bad design. It is a performance problem that will require more hardware and keep us from scaling. Given the minute different in UX for an operator I don't understand why this is a problem.

Am I understanding your request incorrectly?

MichaelJBRichards commented 1 year ago

My understanding was that we were already keeping the batch totals up to date, since the batch balance would be adjusted when a transfer was added to the settlement batch. If that isn't true, then I think we have a more extensive problem; but obviously we should confirm. So, if that's the case, this is simply a question about the summation of batches into matrices. My view -supported, I think, by experimental evidence - is that the summation of batches is unlikely to be a problem if we action it in response to a GET; but I'd certainly be interested in any evidence to the contrary.

So what I'm proposing is that we recalculate matrices, not batches, on demand, and that this is a problem small enough to be manageable in the context of a user interface. I repeat my view that, if we rely on users to request matrix updates and divorce those requests from viewing a matrix, people will wind up making mistakes because they will make the wrong assumptions about what they're looking at.

pedrosousabarreto commented 1 year ago

That is the misunderstanding. The current implementation does not calculate, re-calculate or keeps up to date batches or matrices when each transfer is handled by the settlements. I've mentioned this already. It only associates the transfer to a batch, creating one if necessary.

And this is not a problem, it is a conscious design decision to not tax the system with calculations that are not need at that time. Imagine that the system is handling thousands of transfers per second; we don't want settlements to perform any work that is not necessary thousands of times per second, especially when that work can be done only once, when the operator actually requests it.

Assumption problems of a UI can be dealt with proper UI and UX.

MichaelJBRichards commented 1 year ago

image

So when the reference architecture says (if I've understood the diagram correctly): "on receiving a TransferCommittedFulfilled event, Settlements BC will create a settlement journal entry in the appropriate batch", that's not happening; or, at least, it's not happening in a way that would allow a "Get list of settlement batch account's balance" to trigger a simple "filter and return" action in Transfers BC?

I also don't see where the Settlements BC handles an event that triggers the recalculation of a batch. Did I miss something?

And I'm afraid I'm still not understanding the argument from scale. In my mind, the opposition is not between doing things thousands of times and doing them only once. Each of those transfers will eventually have to be processed exactly once in either mode; the question is whether you do them transfer by transfer, when you've got plenty of time and you can let things smooth out, or postpone them to a big bang when someone's waiting for the results. I had thought that, when Adrian and I modelled the Settlements BC for the reference architecture, we had opted for the first of those.

As to the question of the UI: I'm sort of happy if the Settlements Admin API contains a Get a Matrix resource which triggers calculation of the matrix and then returns it. I continue, though, to believe that separating them as resources at the API level will lead to operational and development mistakes.

pedrosousabarreto commented 1 year ago

No, that's the old version from 2021. There is a new version next to it from 2023. Following the meeting of the 8th I wrote in slack the captured requirements, did the demo at the convening and explained it, and in this issue I've explained several times how it is implemented currently in vNext settlements. As mentioned, the only difference is the moment when the calculation happens and it has no impact on operator.

One last time: The current implementation, for valid reasons of performance, scalability and cost efficiency, does not calculate anything when receiving each transfer, at that time it only allocates the transfer to a batch. All calculations are performed only when an operator creates a matrix or requests its recalculation.

Anyway, I'm going to stop commenting here. I've commented here several times how it works, we're going in circles now.

I think we're delivering on all requirements, If this implementation is not acceptable, let's meet and discuss.

MichaelJBRichards commented 1 year ago

I took it from the Mojaloop site, actually, which I assumed (clearly wrongly) was the agreed version. As I've said, @pedrosousabarreto, you must implement in the way you want. But let me return to where we started: I will not accept the introduction of features into the API which have no user justification, exist solely to cater for technical implementation decisions, and which are, in my judgment, likely to encourage errors and oversights in the operation of the system.

koekiebox commented 1 year ago

@MichaelJBRichards , I have a couple of questions regarding Settlement.administrator.stories.docx. With regards to changing the statuses for batches (2.7 Change the status of a settlement batch): • Change OPEN to CLOSED • Change OPEN to IN_DISPUTE • Change CLOSED to IN_DISPUTE • Change IN_DISPUTE to CLOSED The document currently cover those “routes” :point_up: for "directly" updating a batch status.

Based on previous conversation, it is my understanding that a "refund" would be issued against a transfer that belongs to a batch that is in DISPUTE. This ensures that all settlement batches would eventually end up in a SETTLED status.

So the flow in my mind would be;

  1. Settlement Report A is created
  2. Batch A is put in DISPUTE status due to Transfer A (belonging to Batch A) being in dispute as discovered by inspecting Settlement Report A
  3. Settlement Report A is re-calculated, which would now exclude the settleable total for Batch A (which is still in dispute, but still associated with the settlement report)
  4. Settlement Report A is executed and confirmed (which allowed us to settle all batches for a period that would exclude the Batch A in dispute, including its transfers)
  5. The dispute is taken care of, “out-of-scope” for the Settlements component…, when resolved… a new “refund” Transfer B is created to reverse the flow of funds initially established via Transfer A
  6. At the time of posting the refund to Settlement, due to the nature of Transfer B, the Batch A is updated to be in a CLOSED status
  7. A new settlement report is generated Settlement Report B, which includes Batch A (Now in CLOSED status)
  8. Settlement Report B is then executed and confirmed

So my questions are:

  1. Is the above correct? If so, then:
  2. Would we ever want to dispute a "refund" (I don't see why not)?
  3. We currently persist statuses at a number of places, based on Settlement.administrator.stories.docx is it necessary to persist statuses other than a batch in your mind? Since the batches essentially drives the transfers and matrixes / settlement reports
  4. Do we need to update the document to reflect the above scenario?

Notes:

MichaelJBRichards commented 1 year ago

Please excuse my delay in replying, @koekiebox. It was caused by sampling the cultural delights of the Netherlands. Sorry we didn't make it to Haarlem, but maybe next time...

I think that my replies are as follows:

elnyry-sam-k commented 1 year ago

Following the conversation along, I felt that it'll be useful to have a final state for batches as well

Following what Jason @koekiebox as earlier

• Change OPEN to CLOSED • Change OPEN to IN_DISPUTE • Change CLOSED to IN_DISPUTE • Change IN_DISPUTE to CLOSED

The last one can possibly be (or even after the initial move to 'closed', there maybe an element of timeline, similar to the step I've mentioned below)

• Change IN_DISPUTE to SETTLED (a new term to indicate finality)

and there maybe a mechanism introduced around timing (for example, after a certain time disputes cannot be made for transfers)

• Change OPEN to CLOSED initially and then after the scheme specified time, • Change CLOSED to SETTLED

The reason being that with the current structure because a CLOSED state doesn't mean finality, it leaves open the possibility to move a batch into dispute at any point, in theory.

MichaelJBRichards commented 1 year ago

Hi @elnyry-sam-k. With reference to your comments: a status of SETTLED is already proposed as part of the requirements (see the document further up this thread); but we should not allow batches to be changed to that status manually. Moving to a SETTLED status, implies changes to the position and settlement accounts for the selected settlement model, and should therefore only happen as part of the settlement process. The idea is that, when a batch comes out of the IN_DISPUTE status, it should have a status which allows it to be subsequently settled. It would, of course be open to the builders of an administrative UI to conflate the two processes, and provide functionality which first changed the status of the batch and subsequently settled it. In general, though, I'm happy with the idea that it should be possible to raise a dispute about a transfer at any point before the batch to which the transfer belongs is settled (i.e. while the status of the batch is OPEN or CLOSED.)

I would be against introducing a timing-based mechanism for changing status - at least, internally. We've generally tried to stick to an event-based syntax for the reference architecture, and that seems to me to be the right thing in this case. Again, it's always open to the builders of administrative functions to implement a Cron job to close batches or execute settlements based on the particular characteristics of their implementation, should they wish.

koekiebox commented 1 year ago

Thank you for the response @MichaelJBRichards. I hope that settles all @elnyry-sam-k ? Apologies, been hard at work on Settlements disputes the last couple of days.