raiden-network / raiden

Raiden Network
https://developer.raiden.network
Other
1.84k stars 376 forks source link

Most scenarios fail: No route available message from PFS #5274

Closed agatsoh closed 4 years ago

agatsoh commented 4 years ago

Problem Definition

A common message across most of the scenarios failures for today in both the scenarios and the node logs is {\"errors\": \"Payment couldn't be completed because: there is no route available\"}"}

Usually occurs during a transfer task we will take the example of mfee1 scenario Scenario in a nutshell

node 0 => 0xBd5BD6FA71d704C13092fCC41E9CbBf7d28FbfaA node 1 => 0x7bdE0de73F740b095815C1d0B0a3b2bd3a32D040 node 2 => 0xe4d92E1cBB975d42618d5BeF4E8e961704716707 node 3 => 0xa6b11b3c70E153a9aA6FE07EE5aA0605ED077e02

The exception

{
  "task": "<TransferTask: {'from': 0, 'to': 3, 'amount': 10000, 'expected_http_status': 200}>",
  "event": "Task errored",
  "logger": "scenario_player.tasks.base",
  "level": "error",
  "timestamp": "2019-11-12 03:30:55.786095",
  "exception": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/base.py\", line 74, in __call__\n    return self._run(*args, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/api_base.py\", line 59, in _run\n    f'HTTP status code \"{resp.status_code}\" while fetching {url}. '\nscenario_player.exceptions.legacy.RESTAPIStatusMismatchError: HTTP status code \"409\" while fetching http://127.0.0.1:49607/api/v1/payments/0x62083c80353Df771426D209eF578619EE68D5C7A/0xa6b11b3c70E153a9aA6FE07EE5aA0605ED077e02. Expected 200: {\"errors\": \"Payment couldn't be completed because: there is no route available\"}"
}

What does the exception say about the failure

  1. Fails during <TransferTask : {'from': 0, 'to': 3, 'amount': 10000, 'expected_http_status': 200}> {from: 0, to: 3, amount: 10_000, expected_http_status: 200} line 57

  2. Payment couldn't be completed because: there is no route available

  3. while fetching http://127.0.0.1:49607/api/v1/payments/0x62083c80353Df771426D209eF578619EE68D5C7A/0xa6b11b3c70E153a9aA6FE07EE5aA0605ED077e02 0x62083c80353Df771426D209eF578619EE68D5C7A : token address 0xa6b11b3c70E153a9aA6FE07EE5aA0605ED077e02 : node 3

Observation from the node logs at many places we would encounter this

{
  "current_user": "@0xbd5bd6fa71d704c13092fcc41e9cbbf7d28fbfaa:transport04.raiden.network",
  "node": "0xBd5BD6FA71d704C13092fCC41E9CbBf7d28FbfaA",
  "transport_uuid": "13607931-430a-4041-b3bd-c2ccc0bd0613",
  "address": "0x7bdE0de73F740b095815C1d0B0a3b2bd3a32D040",
  "prev_state": "<AddressReachability.UNKNOWN: 3>",
  "state": "<AddressReachability.UNREACHABLE: 2>",
  "event": "Changing address reachability state",
  "logger": "raiden.network.transport.matrix.utils",
  "level": "debug",
  "greenlet_name": "AlarmTask._run node:0xBd5BD6FA71d704C13092fCC41E9CbBf7d28FbfaA",
  "timestamp": "2019-11-12 03:27:08.366544"
}

AddressReachability.UNREACHABLE: 2

May be a factor in the reachability of the nodes as well through the matrix

Logs of some scenarios for perusal:

Scenario_player_Logs-mfee1_flat_fee-1-latest-2019-11-12.tar.gz

Scenario_player_Logs-mfee2_proportional_fees-1-latest-2019-11-12.tar.gz

Scenario_player_Logs-bf3_multi_directional_payment-1-latest-2019-11-12.tar.gz

Scenario_player_Logs-bf1_basic_functionality-1-latest-2019-11-12.tar.gz

Scenario_player_Logs-pfs1_get_a_simple_path-1-latest-2019-11-12.tar.gz

Scenario_player_Logs-pfs5_too_low_capacity-1-latest-2019-11-12.tar.gz

Dominik1999 commented 4 years ago

nice thank you for the structured issue @agatsoh

LefterisJP commented 4 years ago

I think the reason is similar to https://github.com/raiden-network/raiden/issues/5261

Taking the bf1 scenario fail for example:

{"source": "0x06Aff5478bcE325f90b574491548cb47E617708A", "target": "0x8C5Ed9DDdffC327E439C99787577850548FF50EC", "amount": 1000000000000000, "target_is_direct_partner": false, "can_use_pfs": true, "event": "Getting route for payment", "logger": "raiden.routing", "level": >
{"url": "https://pfs-goerli-with-fee.services-dev.raiden.network", "token_network_address": "0xDa1fBc048f503635950058953f5c60FC1F564ee6", "payload": {"from": "0x06Aff5478bcE325f90b574491548cb47E617708A", "to": "0x8C5Ed9DDdffC327E439C99787577850548FF50EC", "value": 1000000>
{"response": {"our_address": "0x06Aff5478bcE325f90b574491548cb47E617708A"}, "status_code": 200, "event": "Request successful", "logger": "raiden.api.rest", "level": "debug", "timestamp": "2019-11-12 03:17:24.889455"}
{"error": "ServiceRequestIOURejected('No route between nodes found. (2201)')", "event": "Pathfinding Service rejected IOU", "logger": "raiden.network.pathfinding", "level": "debug", "timestamp": "2019-11-12 03:17:24.982835"}
{"event": "Pathfinding Service can not find a route: No route between nodes found. (2201).", "logger": "raiden.network.pathfinding", "level": "info", "timestamp": "2019-11-12 03:17:24.984047"}

Node3 "0x06Aff5478bcE325f90b574491548cb47E617708A" is asking for a path to Node0 "0x8C5Ed9DDdffC327E439C99787577850548FF50EC" and the PFS says no route could be found.

Looking at the PFS logs of the same time

Nov 12 03:15:57 services-dev 088460a59f68[5820]: 2019-11-12 03:15:57.130713 [debug    ] Sync called                    [raiden.network.transport.matrix.client] current_user=@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network
Nov 12 03:15:57 services-dev 088460a59f68[5820]: 2019-11-12 03:15:57.134709 [debug    ] Changing user presence state   [raiden.network.transport.matrix.utils] current_user=@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network node=0x062C12c01D0f17fC9eAa33940D994594d91a0182 prev_state=<UserPresence.ONLINE: 'online'> state=<UserPresence.OFFLINE: 'offline'> user_id=@0x8c5ed9dddffc327e439c99787577850548ff50ec:transport01.raiden.network
Nov 12 03:15:57 services-dev 088460a59f68[5820]: 2019-11-12 03:15:57.147690 [debug    ] Changing address reachability state [raiden.network.transport.matrix.utils] address=0x8C5Ed9DDdffC327E439C99787577850548FF50EC current_user=@0x062c12c01d0f17fc9eaa33940d994594d91a0182:t
ransport04.raiden.network node=0x062C12c01D0f17fC9eAa33940D994594d91a0182 prev_state=<AddressReachability.REACHABLE: 1> state=<AddressReachability.UNREACHABLE: 2>                                                                                                               Nov 12 03:16:02 services-dev 088460a59f68[5820]: 172.19.0.15 - - [2019-11-12 03:16:02] "GET /api/v1/info HTTP/1.1" 200 616 0.001859 

4 seconds before the request is made the PFS syncs with matrix and changes the presence state of "0x8C5Ed9DDdffC327E439C99787577850548FF50EC" to unavailable.

ulope commented 4 years ago

UNAVAILABLE is not OFFLINE and is still REACHABLE

LefterisJP commented 4 years ago

Okay my wording was bad since I am not familiar with the matrix lingo. The logs I quoted above actually say UNREACHABLE

prev_state=<AddressReachability.REACHABLE: 1> state=<AddressReachability.UNREACHABLE: 2>
rakanalh commented 4 years ago

This is probably a duplicate of #5261

LefterisJP commented 4 years ago

I will close this as duplicate since both @rakanalh and I agree on this.