hyperledger / aries-cloudagent-python

Hyperledger Aries Cloud Agent Python (ACA-Py) is a foundation for building decentralized identity applications and services running in non-mobile environments.
https://wiki.hyperledger.org/display/aries
Apache License 2.0
407 stars 511 forks source link

Cannot queue message for delivery, no supported transport #1708

Closed sheraliinamdar closed 1 month ago

sheraliinamdar commented 2 years ago

Hi Team, We are facing below issue in mediator service frequently and we see below logs in mediator agent logs. Please note that this issue happening inconsistently and we are using 0.7.3 ACA-Py version as a mediator service and on the mobile side we are using Areis Javascript framework 0.1.0 . We have posted this query earlier in discord channel also.

2022-03-30 13:47:54,315 aries_cloudagent.messaging.base_handler INFO Received forward for: 9MFPSXYzXpqksmUEPU2WY9XWJzM63QdUqWM4h5t1FFJP 2022-03-30 13:47:54,316 aries_cloudagent.messaging.base_handler INFO Forwarding message to connection: 51b5015f-0bd2-41fb-88e3-915074360ff6 2022-03-30 13:47:54,316 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport 2022-03-30 13:47:54,316 aries_cloudagent.core.event_bus DEBUG Notifying subscribers: <Event topic=acapy::forward::received, payload={'connection_id': '51b5015f-0bd2-41fb-88e3-915074360ff6', 'status': 'undeliverable', 'recipient_key': '4HdFrsNNbDJhPSXWR9gaG3PYdxBTWKUoGdYgG9JwxnJL'}>

Thank you!. Regards Sherali

jleach commented 2 years ago

@sheraliinamdar When run as a mediator agent (and probably in other circumstances) the ACA-py agent uses https:// to establish a connection and then opens a wss:// connection with the wallet (AFJ) for messaging. For AFJ to work properly you need either the ACA-py plug-in (its in the toolbox) or a reverse proxy in front of the mediator to redirect the traffic to the proper port. This is because AFJ only has one URL to communicate mediator so something else needs to redirect the protocols for ACA-py.

I've rebuilt the Aries Mediator Service to run as a simplified docker stack you can test agains and see what I mean. I've also documented how mediators work a little in the README. I suggest testing agains this mediator because I know it works well with AFJ.

TimoGlastra commented 2 years ago

@sheraliinamdar I've also encountered this issue when using ACA-Py as a mediator with AFJ. I haven't been able to determine what the exact issue is, but have been able to consistently reproduce the error.

It will always fail if multiple messages are sent to the mediator at the same for the same recipient. The first two will succeed, the third one will always fail with the error you posted above. Looking at the AFJ side the WS isn't closed, so it seems to me this is an ACA-Py issue. There is some complex stuff happening in the transport layer and I suspect it has to something with the session clearing the message after it is handled, which may create some race conditions if multiple messages for the same inbound session must be delivered at the same time.

To reproduce (please let me know if you can also reproduce it using this flow) go to https://demo.animo.id, start the demo (make sure your wallet is connected to the BCovrin test network) and select the Noah character and create a connection. After you've created a connection and press next 3 credential offer will be sent to you at the same time, of which I always only receive two. Looking at the ACA-Py mediator logs all 3 messages are received and the first two will be delivered directly over the websocket, but the third one will log the warning Cannot queue message for delivery, no supported transport.

If I close (force kill) and reopen the mobile wallet (which triggers a re-connection to the WS) the third message will arrive instantly.

@sheraliinamdar could you verify if you can reproduce this using the demo? Haven't had time to dig deeper, but would be good if we can consistently reproduce the error.

sheraliinamdar commented 2 years ago

@TimoGlastra after selecting the character Noah , It is unable to generate the QR code and One more clarification, is it queuing message in mediator queue or mediator trying to queue it to AFJ Queue ?. will this issue get resolved if we replace in memory Queue to some persistant Queue ? correct me if i am wrong. Badly stuck with this Sphoradic message delivery issue with AFJ

swcurran commented 2 years ago

Any updates on this issue? Should we keep it open?

dinbtechit commented 1 year ago

We encountered the identical problem in our NS Aca-py instance too.

We have implemented the Caddy + Aca-py combination as our mediator, which transitions requests from https:// to wss://. This setup had been functioning properly for a long time. However, after switching from openshift templates to helm charts, these errors began to appear. I have compared the configurations of the openshift template and the helm chart, and they seem to be identical. I am sure what else I might have overlooked.

Configmap.yml

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{.Values.name}}-configmap
data:
  acapy-mediator-args.yml: |-
    label: 'NS Mediator'

    # Print an admin invite
    connections-invite: true
    invite-label: "Service NovaScotia"
    invite-multi-use: true

    # Mediation
    open-mediation: true
    enable-undelivered-queue: true

    # Mediator does not use a ledger
    no-ledger: true
    endpoint:
      - "wss://{{$httpHost}}"
      - "https://{{$httpHost}}"

    log-level: info
    inbound-transport:
      - [http, 0.0.0.0, 3000]
      - [ws, 0.0.0.0, 3001]
    outbound-transport:
      - "ws"
      - "http"
    emit-new-didcomm-prefix:

    # Admin
    admin: [0.0.0.0, 3002]

    #webhook-url: 'http://0.0.0.0:5000/wh/2'
    genesis-url: 'http://test.bcovrin.vonx.io/genesis'

    # Connections
    debug-connections: true
    auto-accept-invites: true
    auto-accept-requests: true
    auto-ping-connection: true

    # Database Name
    wallet-name: 'vcp_aca_mediator'
    wallet-type: askar
    wallet-storage-type: 'postgres_storage'
    wallet-storage-config: '{"url":"{{ $dbhost }}","max_connections":5}'
    auto-provision: true

caddyfile:

    :4000 {
      @websockets {
          header Connection *Upgrade*
          header Upgrade websocket
      }

      handle @websockets {
          reverse_proxy http://{{.Values.name}}-svc:3001
      }

      handle {
          reverse_proxy http://{{.Values.name}}-svc:3000
      }

      log {
              # errors stdout
              output stdout
              # format single_field common_log
              level DEBUG
      }
    }
    :4002 {

        handle {
              reverse_proxy http://{{.Values.name}}-svc:3002
        }

        log {
              # errors stdout
              output stdout
              # format single_field common_log
              level DEBUG
        }
    }
dinbtechit commented 1 year ago

Found the culprit or at least in my case. Looks like the order in which you define the endpoint is what causes this issue.

Also, I should mention, in my case, the error, no supported transport showed up in the issuer agent logs, not the mediator logs. If you see this error within the mediator logs then it's most likely that the agent attempting to communicate with your mediator is the one that is misconfigured.

For example: in acapy-mediator-args.yml. when you define wss:// first followed by the https:// (as shown below).

...
endpoint:
   - "wss://aca-mediator-trucated.novascotia.ca"
   - "https://aca-mediator-trucated.novascotia.ca"
...

Causes the ACA-PY to pick up the first endpoint from the list - which ends up using the wss://... endpoint:

image

I thought Caddy would take care of this but apparently, it doesn't.


Solution

Switching the order of endpoint configuration, i.e., https first and then wss:// resolved the no supported transport error in the issuer agent.

endpoint:
   - "https://aca-mediator-trucated.novascotia.ca"
   - "wss://aca-mediator-trucated.novascotia.ca"

output: And now the invitation that you see in mediator logs gets created with https://.. image

I am assuming that is probably the same when you pass-in the endpoint through the startup arguments. --endpoint.

Hope this helps!