Closed swcurran closed 1 year ago
@usingtechnology -- note this issue as you see what is happening on the ACA-Py Mediator side.
Assigning this to @usingtechnology and @WadeBarnes after a question from @jleach about the status of this issue. In the research being done into the mediator behaviour, have we done enough to be able to detect on the mediator side when an error in establishing a connection (either to the mediator itself, or to another agent) occurs?
Note that the answer to this might be a “no, not possible”, and we close this accordingly.
Thanks!
The error messages listed above are a common occurrence.
For example:
There are a lot of "Error" messages (noise) around (what seems to be) regular web socket traffic. Therefore I think the first thing that needs to happen is a review of the logging associated to the traffic to determine what a normal web socket connection lifecycle should look like and ensue the events are logged appropriately. At the same time we could review the timeout settings and determine what settings would be considered reasonable. The current web socket timing settings are ACAPY_WS_HEARTBEAT_INTERVAL=15
, and ACAPY_WS_TIMEOUT_INTERVAL=60
in all environments, based on recommendations here; https://github.com/hyperledger/aries-cloudagent-python/issues/2157#issuecomment-1468197480
Related issue:
Some thoughts on this ...
@swcurran Do you think this should be transferred into ACA-py as an action item for @WadeBarnes' comments above (review params and logging)? Once done close it - if feels a little amorphic is that its hard to tease out what specific changes need to take place beyond this.
From the sounds of it, I think this request should be pushed to the BC Gov deployment repo for the mediator, and we work on the types of solutions @WadeBarnes mention above that work in the BC Gov context. As we find useful things, updating either or both of the ACA-Py and this repo is appropriate as documentation or code (if that makes sense).
I’m going to close this issue here — feel free to reopen if needed.
We are seeing connection timeouts in Aries mobile wallets with a (more or less) stock Aries Mediator Service mediator. We need a way to be aware of these errors on the mediator side so that we can know when and how often they are occurring, and so that a notification can go to the team that has deployed the monitor. This task is to figure how to add monitoring to a deployment of the aries-mediator-service.
Suggesting steps:
load-testing
folder of this repo can be used) consistently from non prod (dev or test)The logging info below is a possibility. We'd have to see what a "normal" websocket closure (including the mobile device turning off) looks like to ensure we aren't looking at false positives.