raiden-network / raiden

Raiden Network
https://developer.raiden.network
Other
1.84k stars 376 forks source link

[META] - Payment couldn't be completed because: there is no route available [DO NOT PICK UP] #5433

Closed Dominik1999 closed 4 years ago

Dominik1999 commented 4 years ago

Meta Issue for the Symptom "No route found"

Recently we have seen some scenarios failing with the message Payment couldn't be completed because: there is no route available. This is a Meta Issue to track the Symptom and to be able to close all related issues to that.

It does not make sense to keep those issues open, since we try to reproduce the issues to get fresh logs together with the pfs logs. Without the corresponding pfs logs, we cannot analyze further.

Issue checklist

To help us more easily debug your issue please check the boxes for which requirements for safe usage or Raiden were followed:

Dominik1999 commented 4 years ago

Have seen it again: MFEE3 failed last night with Payment couldn't be completed because: there is no route available

@karlb @palango

karlb commented 4 years ago

The error is NoRouteFound('Source not online')", so it looks like we're having problems with the presence handling.

palango commented 4 years ago

The error is NoRouteFound('Source not online')", so it looks like we're having problems with the presence handling.

Damn, and I just removed the presence logging yesterday: https://github.com/raiden-network/raiden-services/pull/679

Dominik1999 commented 4 years ago

@karlb can you add to the error message Source not online but payment received - stupid source!

karlb commented 4 years ago

The PFS has not received a ChannelOpened event for the potential initiator, so it does not track its presence and assumes that it's offline when it requests a route.

Dominik1999 commented 4 years ago

so that does not sound matrix related, right?

karlb commented 4 years ago

Yes, that's not matrix related. It looks like we're skipping a block whenever a new channel is created. Should be reproducible in a unit test. I've opened https://github.com/raiden-network/raiden/issues/5443 for this problem.

Dominik1999 commented 4 years ago

@Dominik1999 to add logs from friday afternoon

Dominik1999 commented 4 years ago

Logs from Fri 6th of December 15:00h CET @karlb

in line 201 in {"task": "<SerialTask: {'name': 'Make 10 payments from 0 to 3 after restart' we got {\"errors\": \"Payment couldn't be completed because: there is no route available\"}"}

in the second transfer task from Node 0 to Node 3

{"task": "<TransferTask: {'from': 0, 'to': 3, 'amount': 1000000000000000, 'lock_timeout': 30}>"

agatsoh commented 4 years ago

On 7th December 2019 The BF4 scenario fails has a No route available failure

{
  "task": "<TransferTask: {'from': 0, 'to': 3, 'amount': 1000000000000000, 'lock_timeout': 30}>",
  "event": "Task errored",
  "logger": "scenario_player.tasks.base",
  "level": "error",
  "timestamp": "2019-12-07 02:01:41.228116",
  "exception": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/base.py\", line 74, in __call__\n    return self._run(*args, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/api_base.py\", line 59, in _run\n    f'HTTP status code \"{resp.status_code}\" while fetching {url}. '\nscenario_player.exceptions.legacy.RESTAPIStatusMismatchError: HTTP status code \"409\" while fetching http://127.0.0.1:55899/api/v1/payments/0x62083c80353Df771426D209eF578619EE68D5C7A/0x0B336fB2F7E68A316dDE5E08a00a7c3Fb4C426b3. Expected 2..: {\"errors\": \"Payment couldn't be completed because: there is no route available\"}"
}

what exception say:

  1. Payment couldn't be completed because: there is no route available

  2. while fetching request

http://127.0.0.1:55899/api/v1/payments/0x62083c80353Df771426D209eF578619EE68D5C7A/0x0B336fB2F7E68A316dDE5E08a00a7c3Fb4C426b3

  1. Failure occurs at line 81

- transfer: {from: 0, to: 3, amount: 1_000_000_000_000_000, lock_timeout: 30}

https://drive.google.com/drive/folders/13DV21NX3X_vKUOpRIobhQUA0w8sx1vs6?usp=sharing

karlb commented 4 years ago

Both of the two cases directly above are caused by NoRouteFound('Source does not have a channel with sufficient capacity'). This message is surprising, considering that the previously processed CapacityUpdates look good (output from the PFS log for @agatsoh's BF4 error):

{
  "message": {
    "signature": "0x022c0503436d6d9a636c1272421bef60ed092abba32699c70cb021b5e1af848f758b1cef7751e2b8be3d18dae14c943da962df27993e837f8161f785155276711b",                                                                                                     
    "canonical_identifier": {
      "chain_identifier": 5,
      "token_network_address": "0xDa1fBc048f503635950058953f5c60FC1F564ee6",
      "channel_identifier": 5090
    },
    "updating_participant": "0x512776974e648921Fa2Ccd829E1f29389560fd28",
    "other_participant": "0xD2298fd940C67Ae209cA504825f0D5C6e8261ADf",
    "updating_nonce": 0,
    "other_nonce": 0,
    "updating_capacity": 1e+18,
    "other_capacity": 1e+18,
    "reveal_timeout": 20,
    "type_name": "PFSCapacityUpdate"
  },
  "event": "Received Capacity Update",
  "level": "debug",
  "logger": "pathfinding_service.service",
  "timestamp": "2019-12-07 01:59:57.076776"
}
{
  "message": {
    "signature": "0x8596a8fa3ee973e298d2e6e9bdeebaa5e179658322bd7a747133363501a41bfa1594f1025a26d6c650e7abe45afcc9e435385554934d54208660c1e9333c60731c",                                                                                                     
    "canonical_identifier": {
      "chain_identifier": 5,
      "token_network_address": "0xDa1fBc048f503635950058953f5c60FC1F564ee6",
      "channel_identifier": 5090
    },
    "updating_participant": "0xD2298fd940C67Ae209cA504825f0D5C6e8261ADf",
    "other_participant": "0x512776974e648921Fa2Ccd829E1f29389560fd28",
    "updating_nonce": 0,
    "other_nonce": 0,
    "updating_capacity": 1e+18,
    "other_capacity": 1e+18,
    "reveal_timeout": 20,
    "type_name": "PFSCapacityUpdate"
  },
  "event": "Received Capacity Update",
  "level": "debug",
  "logger": "pathfinding_service.service",
  "timestamp": "2019-12-07 01:59:57.846347"
}
Dominik1999 commented 4 years ago

We have seen that tonight three times:

Payment couldn't be completed because: there is no route available

palango commented 4 years ago

For BF1 I see:

{
  "event": "Pathfinding Service can not find a route: Target not online (2201).",
  "logger": "raiden.network.pathfinding",
  "level": "info",
  "timestamp": "2019-12-10 23:39:45.594536"
}

Edit:

The target here is 0xD2dE5af1c5Bc99837A70f09B31bd339de24b739D. This address goes offline shortly before the request comes in.

{"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "user_id": "@0xd2de5af1c5bc99837a70f09b31bd339de24b739d:transport04.raiden.network", "prev_state": "<UserPresence.ONLINE: 'online'>", "state": "<UserPresence.OFFLINE: 'offline'>", "event": "Changing user presence state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-10 23:38:24.759751"}
{"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "address": "0xD2dE5af1c5Bc99837A70f09B31bd339de24b739D", "prev_state": "<AddressReachability.REACHABLE: 1>", "state": "<AddressReachability.UNREACHABLE: 2>", "event": "Changing address reachability state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-10 23:38:24.760319"}
{"from_block": 1800075, "end_block": 1800075, "event": "Querying new block(s)", "level": "info", "logger": "raiden_libs.blockchain", "timestamp": "2019-12-10 23:38:35.133549"}
{"latest_committed_block": 1800075, "event": "Updating latest_committed_block", "level": "info", "logger": "pathfinding_service.database", "timestamp": "2019-12-10 23:38:35.900344"}
{"from_block": 1800075, "to_block": 1800075, "getting": 0.77, "processing": 0.0, "total_duration": 0.77, "event_counts": {"UpdatedHeadBlockEvent": 1}, "event": "Processed blocks", "level": "info", "logger": "pathfinding_service.service", "timestamp": "2019-12-10 23:38:35.902642"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "first_sync": false, "sync_token": "s676746_549755_0_28_447_1_1_40080_1", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Starting handle greenlet", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:38:44.957015"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Sync called", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:38:44.957663"}
{"from_block": 1800076, "end_block": 1800076, "event": "Querying new block(s)", "level": "info", "logger": "raiden_libs.blockchain", "timestamp": "2019-12-10 23:38:49.966207"}
{"latest_committed_block": 1800076, "event": "Updating latest_committed_block", "level": "info", "logger": "pathfinding_service.database", "timestamp": "2019-12-10 23:38:50.770848"}
{"from_block": 1800076, "to_block": 1800076, "getting": 0.8, "processing": 0.0, "total_duration": 0.81, "event_counts": {"UpdatedHeadBlockEvent": 1}, "event": "Processed blocks", "level": "info", "logger": "pathfinding_service.service", "timestamp": "2019-12-10 23:38:50.773149"}
{"from_block": 1800077, "end_block": 1800077, "event": "Querying new block(s)", "level": "info", "logger": "raiden_libs.blockchain", "timestamp": "2019-12-10 23:39:04.838924"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "first_sync": false, "sync_token": "s676746_549755_0_28_447_1_1_40080_1", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Starting handle greenlet", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:39:05.153201"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Sync called", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:39:05.153854"}
{"latest_committed_block": 1800077, "event": "Updating latest_committed_block", "level": "info", "logger": "pathfinding_service.database", "timestamp": "2019-12-10 23:39:05.703277"}
{"from_block": 1800077, "to_block": 1800077, "getting": 0.86, "processing": 0.0, "total_duration": 0.87, "event_counts": {"UpdatedHeadBlockEvent": 1}, "event": "Processed blocks", "level": "info", "logger": "pathfinding_service.service", "timestamp": "2019-12-10 23:39:05.704868"}
{"from_block": 1800078, "end_block": 1800078, "event": "Querying new block(s)", "level": "info", "logger": "raiden_libs.blockchain", "timestamp": "2019-12-10 23:39:21.778351"}
{"latest_committed_block": 1800078, "event": "Updating latest_committed_block", "level": "info", "logger": "pathfinding_service.database", "timestamp": "2019-12-10 23:39:22.533816"}
{"from_block": 1800078, "to_block": 1800078, "getting": 0.76, "processing": 0.0, "total_duration": 0.76, "event_counts": {"UpdatedHeadBlockEvent": 1}, "event": "Processed blocks", "level": "info", "logger": "pathfinding_service.service", "timestamp": "2019-12-10 23:39:22.535837"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "first_sync": false, "sync_token": "s676746_549762_0_28_447_1_1_40080_1", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Starting handle greenlet", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:39:25.364018"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Sync called", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:39:25.364427"}
{"from_block": 1800079, "end_block": 1800079, "event": "Querying new block(s)", "level": "info", "logger": "raiden_libs.blockchain", "timestamp": "2019-12-10 23:39:34.594634"}
{"latest_committed_block": 1800079, "event": "Updating latest_committed_block", "level": "info", "logger": "pathfinding_service.database", "timestamp": "2019-12-10 23:39:35.376223"}
{"from_block": 1800079, "to_block": 1800079, "getting": 0.78, "processing": 0.0, "total_duration": 0.78, "event_counts": {"UpdatedHeadBlockEvent": 1}, "event": "Processed blocks", "level": "info", "logger": "pathfinding_service.service", "timestamp": "2019-12-10 23:39:35.378328"}
pfs-goerli-with-fee_1         | 172.21.0.17 - - [2019-12-10 23:39:45] "GET /api/v1/0xDa1fBc048f503635950058953f5c60FC1F564ee6/payment/iou?sender=0xf2c690389111a499E811C2d4701a0d1bF3f6324E&receiver=0x062C12c01D0f17fC9eAa33940D994594d91a0182&timestamp=2019-12-10T23%3A39%3A45&signature=0xa14fff77e9188adb5642097e51b02432bdc641102aa10042bf436d40e8bb7a432e8899448db8943bca72f959c496922fc018cecc591c4a46e6ab5270773ed6231b HTTP/1.1" 200 654 0.005259
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "first_sync": false, "sync_token": "s676746_549762_0_28_447_1_1_40080_1", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Starting handle greenlet", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:39:45.568061"}
{"node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport02.raiden.network", "event": "Sync called", "level": "debug", "logger": "raiden.network.transport.matrix.client", "timestamp": "2019-12-10 23:39:45.568462"}
{"sender": "0xf2c690389111a499E811C2d4701a0d1bF3f6324E", "expected_amount": 10600, "total_amount": 10600, "added_amount": 10500, "event": "Received service fee", "level": "info", "logger": "pathfinding_service.api", "timestamp": "2019-12-10 23:39:45.583502"}
{"error": "NoRouteFound('Target not online')", "details": {"from_": "0xf2c690389111a499E811C2d4701a0d1bF3f6324E", "to": "0xD2dE5af1c5Bc99837A70f09B31bd339de24b739D", "value": 1000000000000000}, "message": "Target not online", "event": "Error while handling request", "level": "warning", "logger": "pathfinding_service.api", "timestamp": "2019-12-10 23:39:45.587017"}
palango commented 4 years ago

For BF2:

{
  "event": "Pathfinding Service can not find a route: No route between nodes found. (2201).",
  "logger": "raiden.network.pathfinding",
  "level": "info",
  "timestamp": "2019-12-11 00:25:23.166211"
}

Edit:

More infos: Theo nodestwo nodes go:

"node": 0, "address": "0x0Ed0a3F4a3B95b3f1118eCca2E856BcD8cb632c8"
"node": 1, "address": "0x262582E09B7Db064623F99F91d22f2fFF31e3EC8"
"node": 2, "address": "0x0D6a6017f2e98fF455a9ff299550FF4aAAeAFdBf"
"node": 3, "address": "0x8F7cbBA93Ee46adee494e2FfbFDe7b731a99fB4f"
"node": 4, "address": "0xc531aF2895498d16627C954240E59896b486048f"

Here again, two nodes change the presence short before the request:


pfs-goerli_1                  | {"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "user_id": "@0x0ed0a3f4a3b95b3f1118ecca2e856bcd8cb632c8:transport04.raiden.network", "prev_state": null, "state": "<UserPresence.ONLINE: 'online'>", "event": "Changing user presence state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-10 23:43:20.923612"}
pfs-goerli_1                  | {"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "user_id": "@0x262582e09b7db064623f99f91d22f2fff31e3ec8:transport04.raiden.network", "prev_state": "<UserPresence.ONLINE: 'online'>", "state": "<UserPresence.OFFLINE: 'offline'>", "event": "Changing user presence state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-11 00:23:39.477808"}
pfs-goerli_1                  | {"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "user_id": "@0x0d6a6017f2e98ff455a9ff299550ff4aaaeafdbf:transport04.raiden.network", "prev_state": null, "state": "<UserPresence.ONLINE: 'online'>", "event": "Changing user presence state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-10 23:43:20.830678"}
pfs-goerli_1                  | {"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "user_id": "@0x8f7cbba93ee46adee494e2ffbfde7b731a99fb4f:transport04.raiden.network", "prev_state": null, "state": "<UserPresence.ONLINE: 'online'>", "event": "Changing user presence state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-10 23:43:20.884757"}
pfs-goerli_1                  | {"current_user": "@0x062c12c01d0f17fc9eaa33940d994594d91a0182:transport04.raiden.network", "node": "0x062C12c01D0f17fC9eAa33940D994594d91a0182", "user_id": "@0xc531af2895498d16627c954240e59896b486048f:transport04.raiden.network", "prev_state": null, "state": "<UserPresence.ONLINE: 'online'>", "event": "Changing user presence state", "level": "debug", "logger": "raiden.network.transport.matrix.utils", "timestamp": "2019-12-10 23:53:35.141397"}
palango commented 4 years ago

For BF6:

{"error": "ServiceRequestIOURejected('The provided payment is lower than service fee. (2104)')", "event": "Pathfinding Service rejected IOU", "logger": "raiden.network.pathfinding", "level": "debug", "timestamp": "2019-12-11 00:34:11.054577"}
{"log_message": "The provided payment is lower than service fee. (2104)", "event": "An error with the path request occured", "logger": "raiden.routing", "level": "warning", "timestamp": "2019-12-11 00:34:11.055220"}

All I find here is:


pfs-goerli-with-fee_1         | 172.21.0.17 - - [2019-12-11 00:34:11] "POST /api/v1/0xDa1fBc048f503635950058953f5c60FC1F564ee6/paths HTTP/1.1" 200 486 0.257421
pfs-goerli-with-fee_1         | {"error": "InsufficientServicePayment(None)", "details": {"expected_amount": 400}, "message": "The provided payment is lower than service fee.", "event": "Error while handling request", "level": "warning", "logger": "pathfinding_service.api", "timestamp": "2019-12-11 00:34:11.049139"}
ulope commented 4 years ago

The server-side presence monitoring shows this:

@0xd2de5af1c5bc99837a70f09b31bd339de24b739d:transport04.raiden.network

So this does not look like a return of the presence issue since all servers agree about the online state and see the correct state update within ~2s of each other.

(The log files were truncated on transport02 and 04, I assume because of the disk full issue earlier that week, but the docker compose logs still had the relevant information.)

palango commented 4 years ago

So this does not look like a return of the presence issue since all servers agree about the online state and see the correct state update within ~2s of each other.

With presence issue I meant more that we're seeing online nodes (hopefully) as offline. And nice to know that this is consistent among the servers.

palango commented 4 years ago

What's interesting about BF2:

The last sync call, was at 00:25:14, which is 9secs before the PFS request.

{
  "node": "0x0Ed0a3F4a3B95b3f1118eCca2E856BcD8cb632c8",
  "first_sync": false,
  "sync_token": "s1370554_915524_0_35_973_5_1_32899_1",
  "current_user": "@0x0ed0a3f4a3b95b3f1118ecca2e856bcd8cb632c8:transport04.raiden.network",
  "presence_events_qty": 14,
  "to_device_events_qty": 0,
  "rooms_invites_qty": 0,
  "rooms_leaves_qty": 0,
  "rooms_joined_member_count": 0,
  "rooms_invited_member_count": 0,
  "rooms_join_state_qty": 0,
  "rooms_join_state_events_qty": 0,
  "rooms_join_ephemeral_events_qty": 0,
  "rooms_join_account_data_events_qty": 0,
  "event": "Starting _handle_response",
  "logger": "raiden.network.transport.matrix.client",
  "level": "debug",
  "greenlet_name": "GMatrixClient.listen_forever user_id:@0x0ed0a3f4a3b95b3f1118ecca2e856bcd8cb632c8:transport04.raiden.network",
  "timestamp": "2019-12-11 00:25:14.524498"
}
{
  "node": "0x0Ed0a3F4a3B95b3f1118eCca2E856BcD8cb632c8",
  "current_user": "@0x0ed0a3f4a3b95b3f1118ecca2e856bcd8cb632c8:transport04.raiden.network",
  "event": "Sync called",
  "logger": "raiden.network.transport.matrix.client",
  "level": "debug",
  "greenlet_name": "GMatrixClient.listen_forever user_id:@0x0ed0a3f4a3b95b3f1118ecca2e856bcd8cb632c8:transport04.raiden.network",
  "timestamp": "2019-12-11 00:25:14.525826"
}
Dominik2002 commented 4 years ago

This issue cannot be worked on atm. We first fix #5498 and see if it happens again