Closed rakanalh closed 4 years ago
@rakanalh @err508
I suggest the following procedure to reproduce the Matrix Bug:
This is basically the current status, right?
Current status of this issue:
the problem was not reproducible locally with a local matrix federation and a local PFS, where the scenario ran successfully
due to some changes in the client, we now have the raiden nodes update each others presence consistently throughout multiple scenario runs. However, we still see"errors": "Payment couldn't be completed because: there is no route available"
. This was found to be caused by the PFS not tracking the nodes presence correctly.
we currently believe, that this error was caused when the transport server databases were deleted. After the transport servers were restarted, due to the nightly scenario player runs, multiple nodes got started simultaneously on different homeservers and as @rakanalh found out, this led to a misconfiguration of the remote servers, where there are multiple global_rooms on the different server, but the PFS would only listen for presence updates in the one on it's own homeserver. This explains why the problem was not reproducible locally, as there the rooms where created correctly for each run on fresh servers and also why the scenario worked when all nodes and the PFS were on the same homeserver.
Planned next steps:
Path finding rooms on the remote servers:
{
"aliases": [
"#raiden_goerli_path_finding:transport01.raiden.network"
],
"canonical_alias": "#raiden_goerli_path_finding:transport01.raiden.network",
"guest_can_join": false,
"num_joined_members": 357,
"room_id": "!zPNQUseHcedZfiQKEg:transport01.raiden.network",
"world_readable": false
}
{
"aliases": [
"#raiden_goerli_path_finding:transport02.raiden.network"
],
"canonical_alias": "#raiden_goerli_path_finding:transport02.raiden.network",
"guest_can_join": false,
"num_joined_members": 57,
"room_id": "!WqXlHrMJeoxnsHdiOQ:transport02.raiden.network",
"world_readable": false
}
{
"aliases": [
"#raiden_goerli_path_finding:transport03.raiden.network"
],
"canonical_alias": "#raiden_goerli_path_finding:transport03.raiden.network",
"guest_can_join": false,
"num_joined_members": 44,
"room_id": "!aowlIeSnJIvsDgpBfD:transport03.raiden.network",
"world_readable": false
}
This basically means that we have 3 distinct rooms on these servers. Whenever we run scenarios which pass --matrix-server
to one of the above server, the node ends up joining the room on that server. As a result, each of the nodes (including the PFS) in a single scenario run has join a different room.
Synapse presence works in a way that our nodes receive presence updates iff there's an intersection in the list of rooms our nodes join. If there's no intersection, we are not allowed to see the other user's presence. For the case of raiden client nodes, this is not a problem. This is because participating nodes in a single channel would join a room specific to that channel which would allow us to see our partner's presence. However, the exception to this is the PFS which relies on the discovery
room and the path_finding
room to be able to figure out routes. The PFS only joins one discovery and one path finding rooms assuming that these are the only discovery
and path_finding
rooms on the servers. This is not the case according to the list of rooms above.
Closing this issue as the solution was deemed to be in: https://github.com/raiden-network/raiden-services/pull/609
During testing, we have had multiple instances of the same failure in both the client and the PFS while making a transfer:
While running the BF1 scenario, all 5 nodes were up:
The initial transfer from 0 to 3 failed due to not being provided with routes by the PFS.
When @palango investigated on the PFS' side, we found out that the PFS saw the participating nodes as offline:
All logs & DBs for the nodes can be found here:
run_21.zip