raiden-network / raiden

Raiden Network
https://developer.raiden.network
Other
1.84k stars 376 forks source link

Matrix / Synapse presence is flaky #5059

Closed rakanalh closed 4 years ago

rakanalh commented 4 years ago

During testing, we have had multiple instances of the same failure in both the client and the PFS while making a transfer:

While running the BF1 scenario, all 5 nodes were up:

DeepinScreenshot_select-area_20191009154758

The initial transfer from 0 to 3 failed due to not being provided with routes by the PFS.

{"routes": [], "feedback_token": null, "event": "Received route(s) from PFS", "logger": "raiden.routing", "level": "info", "timestamp": "2019-10-09 12:39:11.476518"}
{"errors": "Payment couldn't be completed because: there is no route available", "status_code": 409, "event": "Error processing request", "logger": "raiden.api.rest", "level": "error", "timestamp": "2019-10-09 12:39:11.527605"}

When @palango investigated on the PFS' side, we found out that the PFS saw the participating nodes as offline:

2019-10-09 12:39:11.451128 [warning  ] Error while handling request   [pathfinding_service.api] details={'from_': '0x7ebE1fa14F414873Bc713261c17727ec37bdb89F', 'to': '0x90aafBeEbEb11E9b17Bc50E4141aeDFcfDfAD8b9', 'value': 1000000000000000} error=NoRouteFound(None) message=No route between nodes found.

All logs & DBs for the nodes can be found here:

run_21.zip

Dominik1999 commented 4 years ago

@rakanalh @err508

I suggest the following procedure to reproduce the Matrix Bug:

This is basically the current status, right?

err508 commented 4 years ago

Current status of this issue:

Planned next steps:

rakanalh commented 4 years ago

Path finding rooms on the remote servers:

{
    "aliases": [
        "#raiden_goerli_path_finding:transport01.raiden.network"
    ],
    "canonical_alias": "#raiden_goerli_path_finding:transport01.raiden.network",
    "guest_can_join": false,
    "num_joined_members": 357,
    "room_id": "!zPNQUseHcedZfiQKEg:transport01.raiden.network",
    "world_readable": false
}
{
    "aliases": [
        "#raiden_goerli_path_finding:transport02.raiden.network"
    ],
    "canonical_alias": "#raiden_goerli_path_finding:transport02.raiden.network",
    "guest_can_join": false,
    "num_joined_members": 57,
    "room_id": "!WqXlHrMJeoxnsHdiOQ:transport02.raiden.network",
    "world_readable": false
}
{
    "aliases": [
        "#raiden_goerli_path_finding:transport03.raiden.network"
    ],
    "canonical_alias": "#raiden_goerli_path_finding:transport03.raiden.network",
    "guest_can_join": false,
    "num_joined_members": 44,
    "room_id": "!aowlIeSnJIvsDgpBfD:transport03.raiden.network",
    "world_readable": false
}

This basically means that we have 3 distinct rooms on these servers. Whenever we run scenarios which pass --matrix-server to one of the above server, the node ends up joining the room on that server. As a result, each of the nodes (including the PFS) in a single scenario run has join a different room.

Synapse presence works in a way that our nodes receive presence updates iff there's an intersection in the list of rooms our nodes join. If there's no intersection, we are not allowed to see the other user's presence. For the case of raiden client nodes, this is not a problem. This is because participating nodes in a single channel would join a room specific to that channel which would allow us to see our partner's presence. However, the exception to this is the PFS which relies on the discovery room and the path_finding room to be able to figure out routes. The PFS only joins one discovery and one path finding rooms assuming that these are the only discovery and path_finding rooms on the servers. This is not the case according to the list of rooms above.

rakanalh commented 4 years ago

Closing this issue as the solution was deemed to be in: https://github.com/raiden-network/raiden-services/pull/609