openwallet-foundation / acapy

ACA-Py is a foundation for building decentralized identity applications and services running in non-mobile environments.
https://aca-py.org
Apache License 2.0
421 stars 515 forks source link

Serialize the mediator routing request/connection request messages #2073

Open ianco opened 1 year ago

ianco commented 1 year ago

Currently, when making a connection request (with mediation), two messages are sent simultaneously - one to the mediator, to update the mediator's routing records (the mediator can potentially accept or reject), and a connection request to the inviter. There is a race condition, because the inviter can potentially accept the connection request and start sending messages before the mediator has updated their records.

(The same issue exists on the invitation (with mediation) - when the invitation is created a routing request is sent to the mediator. Potentially the invitee can accept the invitation etc and start sending messages before the mediator has updated their records, although this scenario is less likely, as the invitation is sent out-of-band. However this is a scenario the controller should be aware of.)

There is a potential solution to the connection request/mediation request issue, to send these in sequence rather than in parallel, using notifications and the event bus. (The Endorser flows use this pattern - when an endorser response is received, it generates a notification using the event bus. Any protocols that need to respond to this event, or perform follow-up tasks, listed for this event. For example when a credential definition request is endorsed and written to the ledger, a follow up process is triggered to create the revocation registries (if necessary).)

dbluhm commented 1 year ago

I intended to fix this with https://github.com/hyperledger/aries-cloudagent-python/pull/1851; when ACA-Py is the mediation client, to serialize the process, the controller should call the update-keylist endpoint before moving on to accept-invitation or accept-request. I acknowledge that having this be automated somehow might be better than making it a controller concern so I don't necessarily consider this a fully solved problem; however, the issue we kept on running into was complexity for clustered ACA-Py and so we opted to make it a controller concern -- but that may just be passing complexity up the stack.

Ultimately, for our needs (multi-tenant ACA-Py behind mediator), we opted to side-step the complexity altogether and make a plugin that allowed the multi-tenant agent to make direct REST requests to the mediator with keylist updates so they were definitely completed before we went on to the next step.

jleach commented 1 year ago

Going with the Alice (holder) and Faber (issuer) example:

I think it would be best to have the Alice's agent send the mediation request first and once the mediator responds with (ok - @ianco mentions above it can reject) then it can send the connection request to Faber's agent with the appropriate routing keys. I like this way because: 1) Faber is blind to the speed and status of the mediation request so it won't matter if its already setup and 2) its backwards compatible with existing agents (no need to update Faber) only the mediator and Alice's agent need updating. That is to say, agents that depend on mediation need to update ACA-py those that don't care have nothing to do.

EDIT: Come to think of it. If done in this order only agents that require a mediation would need an update. Everything else can remain as is. An accompanying story might read: As a holder, I need my mediation to be in place before I open connections so that I can receive offers or requests.

usingtechnology commented 1 year ago

Just leaving this here for posterity. I was investigating and this is a way I could produce an error (no time now to look at a fix). Not sure if this is the exact error as above, so anyone else that can document steps to replicate their errors locally would be great.

  1. run aries-mediator-service with open-mediation: false
  2. run a multi-tenant acapy (just easier to do invites etc with a single instance) with no mediator start up params.
  3. create a tenant, explicitly receive the mediator invitation url
  4. wait for mediator connection to be active
  5. POST /mediation/request/<mediator connection id> to kick off mediation request process and grabmediation_id` from response
  6. do not wait for mediation status to be granted
  7. create an invitation using the mediation id.
  8. will get errors during keylist update... "ConnRecord record not found", "No DID defined for verkey", "Unknown recipient key received in keylist update response"

We can also do the above with mediator service using open-mediation: true, but the errors we get at step 8 are more reasonable: "Mediation is not granted for mediation identified by..."

So I suppose the first error(s) would be mediation is requested but no response from mediator. The second is: response from mediator but status is not granted. I think in the second case, the exception / error is quite clear and perhaps nothing is needed?

It would be appreciated if more context (ie steps to reproduce) was provided.