czepluch commented 4 years ago

Problem definition

I fired up 3 nodes yesterday and closed them down for the night. Then started them again this morning, but using the https://t02.testtransport.raiden.network/ server instead of https://t01.testtransport.raiden.network/, since t01 is down. Now, when I am trying to do a mediated transfer I get the error below. Below error is from the node initiating the transfer. I've also attached full logs of all three nodes involved in the transfer.

2020-04-02 09:38:04.218933 [info     ] 127.0.0.1 - - [2020-04-02 11:38:04] "GET /api/v1/pending_transfers HTTP/1.1" 200 137 0.001118 [raiden.api.rest.pywsgi] 
2020-04-02 09:38:04.512007 [debug    ] Initiating payment             [raiden.api.rest] amount=10000000000000 lock_timeout=None node=0xce936b7FEe88b48267EbD82Ad7897Aeb52803A7F payment_identifier=1585820284344 registry_address=0xFfaf04Ad776AAa3AbD62AFA340f3c55931a68fB1 secret=None secret_hash=None target_address=0x0bae0289AAA26845224F528F9B9DefE69e01606E token_address=0x40aE2748339583694D2BaB5da5Bbcf4079E9ABE5
2020-04-02 09:38:04.512764 [debug    ] Initiating transfer            [raiden.api.python] amount=10000000000000 identifier=1585820284344 initiator=0xce936b7FEe88b48267EbD82Ad7897Aeb52803A7F target=0x0bae0289AAA26845224F528F9B9DefE69e01606E token=0x40aE2748339583694D2BaB5da5Bbcf4079E9ABE5
2020-04-02 09:38:04.513352 [debug    ] Mediated transfer              [raiden.raiden_service] amount=10000000000000 identifier=1585820284344 node=0xce936b7FEe88b48267EbD82Ad7897Aeb52803A7F target=0x0bae0289AAA26845224F528F9B9DefE69e01606E token_network_address=0xccd165775c1b05Fe0261e0a89A14777f89629038
2020-04-02 09:38:04.542877 [debug    ] Healthcheck already enabled    [raiden.network.transport.matrix.transport] current_user=@0xce936b7fee88b48267ebd82ad7897aeb52803a7f:t02.testtransport.raiden.network greenlet_name=Greenlet<fn:_health_check_worker> node=0xce936b7FEe88b48267EbD82Ad7897Aeb52803A7F peer_address=0x0bae0289AAA26845224F528F9B9DefE69e01606E transport_uuid=ab453c7b-d1a8-44ff-931b-1236ff5e76aa
2020-04-02 09:38:04.880692 [info     ] Requesting paths from Pathfinding Service [raiden.network.pathfinding] payload={'from': '0xce936b7FEe88b48267EbD82Ad7897Aeb52803A7F', 'to': '0x0bae0289AAA26845224F528F9B9DefE69e01606E', 'value': 10000000000000, 'max_paths': 3, 'iou': {'sender': '0xce936b7FEe88b48267EbD82Ad7897Aeb52803A7F', 'receiver': '0x915d793e271843B2F6824f688356a0E83E6A1DC9', 'one_to_n_address': '0x50F0714e0D81fc7513cd0189B8b28d661a28eCcb', 'amount': 200, 'expiration_block': 2652550, 'chain_id': 5, 'signature': '0xa5d7c94d673dba7c594be4442f6ccb040bf5383b1b48880da3883d0da2768a31437deaa044db57ef16a70b5cb3cb2ce78f4cf1991c0dd63d5f227d121646df931b'}} token_network_address=0xccd165775c1b05Fe0261e0a89A14777f89629038 url=https://pfs.t02.testtransport.raiden.network
2020-04-02 09:38:05.015051 [debug    ] Request successful             [raiden.api.rest_utils] response=[] status_code=<HTTPStatus.OK: 200>
2020-04-02 09:38:05.016217 [info     ] 127.0.0.1 - - [2020-04-02 11:38:05] "GET /api/v1/pending_transfers HTTP/1.1" 200 137 0.001638 [raiden.api.rest.pywsgi] 
2020-04-02 09:38:05.908702 [warning  ] An error with the path request occurred [raiden.routing] error=Source not online error_code=2201 http_error=404 log_message=PFS: Source not online
2020-04-02 09:38:05.909264 [warning  ] Request to Pathfinding Service was not successful. No routes to the target are found. [raiden.routing] pfs_message=PFS: Source not online
2020-04-02 09:38:05.909696 [error    ] Error processing request       [raiden.api.rest_utils] errors=Payment couldn't be completed because: PFS: Source not online status_code=<HTTPStatus.CONFLICT: 409>
2020-04-02 09:38:05.910621 [info     ] 127.0.0.1 - - [2020-04-02 11:38:05] "POST /api/v1/payments/0x40aE2748339583694D2BaB5da5Bbcf4079E9ABE5/0x0bae0289AAA26845224F528F9B9DefE69e01606E HTTP/1.1" 409 283 1.399679 [raiden.api.rest.pywsgi]

Node1 (initiator): raiden-debug_2020-04-02T09:32:26.412060.log

Node2 (mediator): raiden-debug_2020-04-02T09:31:48.550199.log

Node3 (receiver): raiden-debug_2020-04-02T09:30:48.519095.log

Todos

[x] check if all nodes were run on t02.testtransport.raiden.network
[ ] rerun test

karlb commented 4 years ago

Before the PFS request, I see a You are not allowed to see their presence error in the same log file:

{
  "event": "Could not fetch user presence",
  "logger": "raiden.network.transport.matrix.utils",
  "level": "error",
  "greenlet_name": "Greenlet<fn:_health_check_worker>",
  "timestamp": "2020-04-02 09:36:09.616590",
  "exception": "Traceback (most recent call last):\n  File \"raiden/network/transport/matrix/utils.py\", line 442, in _fetch_user_presence\n  File \"raiden/network/transport/matrix/client.py\", line 535, in get_user_presence\n  File \"raiden/network/transport/matrix/client.py\", line 228, in get_presence\n  File \"raiden/network/transport/matrix/client.py\", line 180, in _send\n  File \"matrix_client/api.py\", line 691, in _send\nmatrix_client.errors.MatrixRequestError: 403: {\"errcode\":\"M_FORBIDDEN\",\"error\":\"You are not allowed to see their presence.\"}"
}

If there are problems with the matrix communication, this might cause the node to seem offline to the matrix server and thus to the PFS.

karlb commented 4 years ago

According to @fredo

Usually you are not allowed to see presence if you do not share a room with the other user

and

this happened due to the switch to a new broadcast room

So this was just a temporary problem and should not happen, again.

czepluch commented 4 years ago

But this still happens today. Even after restarting the nodes. I would not consider this issue as fixed.

fredo commented 4 years ago

Does this still happen with old nodes which have been used before yesterday 12 pm ? @czepluch

czepluch commented 4 years ago

Yes, the nodes were first spun up on April 1st with the new RC. Initially on t01, but then when it was down yesterday I changed to t02.

czepluch commented 4 years ago

I started everything up with fresh nodes using this command to start Raiden:

./raiden-v0.200.0-rc3-linux-x86_64 --keystore-path /path/to/keystore --network-id 5 --eth-rpc-endpoint http://goerli-geth.dappnode:8545 --pathfinding-service-address=https://pfs.t02.testtransport.raiden.network --matrix-server=https://t02.testtransport.raiden.network --log-config "raiden:debug" --api-address localhost:500x --enable-monitoring

However, when trying to perform mediated transfers I still get the same error.

Logs: raiden_debug_logs.zip The file starting from 10:12:12 is the initiator The file starting from 10:10:46 is the mediator The file starting from 10:09:52 is the target

fredo commented 4 years ago

The problem occurred because the PFS did not restart and was still in the old broadcast room. It also missed the capacity updates by the nodes.

Restarting the PFS let it join to the new Broadcast Restarting the nodes send the update capacity to the PFS again

raiden-network / raiden

Payment couldn't be completed because: PFS: Source not online #6039

Problem definition

Todos