raiden-network / raiden

Raiden Network
https://developer.raiden.network
Other
1.84k stars 376 forks source link

BF1: AssertSumTask error #5219

Closed agatsoh closed 4 years ago

agatsoh commented 4 years ago

Problem Definition

{"json": {"errors": "Payment couldn't be completed because: there is no route available"}, "event": "Received response", "logger": "scenario_player.tasks.api_base", "level": "debug", "timestamp": "2019-11-05 04:12:14.505142"}

Also assert sum failures

{"task": "<AssertSumTask: {'from': 0, 'balance_sum': 1105000000000000000}>", "event": "Task errored", "logger": "scenario_player.tasks.base", "level": "error", "timestamp": "2019-11-05 05:42:12.027318", "exception": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/base.py\", line 74, in __call__\n return self._run(*args, **kwargs)\n File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/api_base.py\", line 70, in _run\n return self._process_response(response_dict)\n File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/channels.py\", line 254, in _process_response\n f'Expected sum value \"{assert_value}\" for channel fields \"{field}\". '\nscenario_player.exceptions.legacy.ScenarioAssertionError: Expected sum value \"1105000000000000000\" for channel fields \"balance\". Actual value: \"1106000000000000000\". Channels: [{'token_address': '0x62083c80353Df771426D209eF578619EE68D5C7A', 'partner_address': '0xD0660c58b9B1aaB52b7Ea96156756AA442082f14', 'total_deposit': 1000000000000000000, 'balance': 1106000000000000000, 'state': 'opened', 'channel_identifier': 1801, 'settle_timeout': 40, 'total_withdraw': 0, 'token_network_address': '0xDa1fBc048f503635950058953f5c60FC1F564ee6', 'reveal_timeout': 20}]"}

Scenario_player_Logs-bf1_basic_functionality-1-latest-2019-11-05.tar.gz

agatsoh commented 4 years ago

Closer inspection of the node logs MatrixRequestError For nodes 0, 1 ,2 , 4

{"current_user": "@0xa26f98ef27077066cca77ca7ca601358f55bb739:transport02.raiden.network", "node": "0xa26F98EF27077066Cca77cA7cA601358F55bB739", "transport_uuid": "a1eb9a64-622e-474d-88e8-f8281420c2c9", "room_name": "#raiden_goerli_0x8ce96fd8241e28e075983f75e4c67d6663ed4216_0xa26f98ef27077066cca77ca7ca601358f55bb739:transport02.raiden.network", "error": "MatrixRequestError('404: {\"errcode\":\"M_NOT_FOUND\",\"error\":\"Room alias #raiden_goerli_0x8ce96fd8241e28e075983f75e4c67d6663ed4216_0xa26f98ef27077066cca77ca7ca601358f55bb739:transport02.raiden.network not found\"}')", "event": "No room for peer, trying to create", "logger": "raiden.network.transport.matrix.transport", "level": "debug", "greenlet_name": "RetryQueue node:0xa26F98EF27077066Cca77cA7cA601358F55bB739 recipient:0x8cE96fd8241E28e075983F75E4C67D6663ed4216", "timestamp": "2019-11-05 04:20:20.952394"}

ServiceRequestIOURejected and MatrixRequestError for node 3

{"error": "ServiceRequestIOURejected('No route between nodes found. (2201)')", "event": "Pathfinding Service rejected IOU", "logger": "raiden.network.pathfinding", "level": "debug", "timestamp": "2019-11-05 04:12:14.398986"}
rakanalh commented 4 years ago

@agatsoh:

{"json": {"errors": "Payment couldn't be completed because: there is no route available"}, "event": "Received response", "logger": "scenario_player.tasks.api_base", "level": "debug", "timestamp": "2019-11-05 04:12:14.505142"}

Could be because the first transfer intentionally triggers this. This is obvious to me because you reached the summation assert which means that subsequent transfers passed.

{"current_user": "@0xa26f98ef27077066cca77ca7ca601358f55bb739:transport02.raiden.network", "node": "0xa26F98EF27077066Cca77cA7cA601358F55bB739", "transport_uuid": "a1eb9a64-622e-474d-88e8-f8281420c2c9", "room_name": "#raiden_goerli_0x8ce96fd8241e28e075983f75e4c67d6663ed4216_0xa26f98ef27077066cca77ca7ca601358f55bb739:transport02.raiden.network", "error": "MatrixRequestError('404: {\"errcode\":\"M_NOT_FOUND\",\"error\":\"Room alias #raiden_goerli_0x8ce96fd8241e28e075983f75e4c67d6663ed4216_0xa26f98ef27077066cca77ca7ca601358f55bb739:transport02.raiden.network not found\"}')", "event": "No room for peer, trying to create", "logger": "raiden.network.transport.matrix.transport", "level": "debug", "greenlet_name": "RetryQueue node:0xa26F98EF27077066Cca77cA7cA601358F55bB739 recipient:0x8cE96fd8241E28e075983F75E4C67D6663ed4216", "timestamp": "2019-11-05 04:20:20.952394"}

This is part of what the transport does to join / create a room and create it's alias on a specific server and is not considered a faulty behaviour (yet).

You're posting alot of stuff but none of them are useful unless you state at which point the scenario failed. You shouldn't post every failure you see in the logs because some of them, similar to the above might have been parts of an intentional failed assert.

Could you update the issue description on what exactly the issue is?

agatsoh commented 4 years ago

@rakanalh

Scenario in a nutshell node 0 => 0x8cE96fd8241E28e075983F75E4C67D6663ed4216 node 1 => 0xD0660c58b9B1aaB52b7Ea96156756AA442082f14 node 2 => 0x78690120B063b234b7ed8540Afc3e948dd10B2ec node 3 => 0xB0F7eCB3DBC8474dF29e0501d576A25b71d8ED0e node 4 => 0xa26F98EF27077066Cca77cA7cA601358F55bB739

The Scenario fails at this point

{
  "task": "",
  "event": "Task errored",
  "logger": "scenario_player.tasks.base",
  "level": "error",
  "timestamp": "2019-11-05 05:42:12.027318",
  "exception": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/base.py\", line 74, in __call__\n    return self._run(*args, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/api_base.py\", line 70, in _run\n    return self._process_response(response_dict)\n  File \"/usr/local/lib/python3.7/site-packages/scenario_player/tasks/channels.py\", line 254, in _process_response\n    f'Expected sum value \"{assert_value}\" for channel fields \"{field}\". '\nscenario_player.exceptions.legacy.ScenarioAssertionError: Expected sum value \"1105000000000000000\" for channel fields \"balance\". Actual value: \"1106000000000000000\". Channels: [{'token_address': '0x62083c80353Df771426D209eF578619EE68D5C7A', 'partner_address': '0xD0660c58b9B1aaB52b7Ea96156756AA442082f14', 'total_deposit': 1000000000000000000, 'balance': 1106000000000000000, 'state': 'opened', 'channel_identifier': 1801, 'settle_timeout': 40, 'total_withdraw': 0, 'token_network_address': '0xDa1fBc048f503635950058953f5c60FC1F564ee6', 'reveal_timeout': 20}]"
}

What does the exception say:

  1. It fails during "task": "<AssertSumTask : {'from': 0, 'balance_sum': 1105000000000000000}>

  2. In the BF1 scenario this is line number 233 https://github.com/raiden-network/raiden/blob/develop/raiden/tests/scenarios/bf1_basic_functionality.yaml#L233

  3. Expected sum value \"1105000000000000000\" for channel fields \"balance\". Actual value: \"1106000000000000000\" which means: 1_105_000_000_000_000_000 : Asserted value 1_106_000_000_000_000_000 : Actual value

It also gives the actual state of the channel


[
  {
    "token_address": "0x62083c80353Df771426D209eF578619EE68D5C7A",
    "partner_address": "0xD0660c58b9B1aaB52b7Ea96156756AA442082f14",
    "total_deposit": 1000000000000000000,
    "balance": 1106000000000000000,
    "state": "opened",
    "channel_identifier": 1801,
    "settle_timeout": 40,
    "total_withdraw": 0,
    "token_network_address": "0xDa1fBc048f503635950058953f5c60FC1F564ee6",
    "reveal_timeout": 20
  }
]

  1. Which commit made the change to sum value 1_105_000_000_000_000_000 ? https://github.com/raiden-network/raiden/commit/265ff6eb4208e7096db1ca64c50db7b6c0654d18#diff-44f346fb37486eb90221b76f3ce31a1e This commit was made on 31st Oct 2019

  2. Status of bf1_basic_functionality runs since the change was made SP runs for BF1 scenario from 31st Oct 31stOct None(All Scenarios did not run) 2ndNov Success 3rdNov Fail 4thNov Success 5thNov Fail

  3. Question to be analysed since the last change took place on 31st of Oct why is the scenario showing flaky behaviour.

rakanalh commented 4 years ago

Thanks @agatsoh. Will look into it.

LefterisJP commented 4 years ago

I wrote a question beforehand but I misread an address mapping so it's invalid, thus I deleted it.

LefterisJP commented 4 years ago

Explanation

All right so I kind of figured it out.

There is nothing wrong with the transfer right before the assertion:

https://github.com/raiden-network/raiden/blob/e42e1175f5089ea0f9410615c5d4dae337c68e35/raiden/tests/scenarios/bf1_basic_functionality.yaml#L224-L228

All of the 100 transfers are made fine.

And also since the assertion before closing the channel passed we know all transfers succeeded:

https://github.com/raiden-network/raiden/blob/e42e1175f5089ea0f9410615c5d4dae337c68e35/raiden/tests/scenarios/bf1_basic_functionality.yaml#L202-L207

What we don't know is which path they took ...

The sum of balances of channel-0-1 before 100 transfers from 3 -> 0: 1105000000000000000-100000000000000000 = 1005000000000000000

So balance of channel-0-4 should have been

1895000000000000000 - 1005000000000000000 = 890000000000000000

But according to the balance checks while looking at the DB channel 0-4 has a balance of 889000000000000000 before closing.

So there is the problem. One of the transfers with node 0 as its target took a different path than expected.

That can be confirmed when looking at the unlocks/secret reveals of the two channels of node 0:

For channel-0-4 we seem to have 9 ReceiveUnlock, so 9 transfers passed from us in the direction 4->0 and 120 ReceiveSecretReveal so 120 transfers passed from us in the direction 0->4

For channel-0-1 we seem to have 111 ReceiveUnlock, so 111 transfers passed from us in the direction 1->0 and 5 ReceiveSecretReveal so 5 transfers passed from us in the direction 0->1

The number 9 does not match the addition of any of the sending tasks, and neither does 111 so some transfer task used two different paths.

Now you may ask why do we care which path any of the transfers took? They succeeded so it should be fine.

The channel-0-4 is closed and settled here:

https://github.com/raiden-network/raiden/blob/e42e1175f5089ea0f9410615c5d4dae337c68e35/raiden/tests/scenarios/bf1_basic_functionality.yaml#L208-L223

Conclusion and TL;DR

Since the scenario closes/settles a channel that channel's balance is taken out of the calculation. From that point on we need to be thinking on channel and not node balances and as such can't afford to not know which path a transfer took

How to address this?

Possible solutions:

Dominik1999 commented 4 years ago

nice analysis @czepluch or @christianbrb can you take over and address Lefteris' findings in the scenario somehow?

czepluch commented 4 years ago

I'd suggest to just move the close and settle of the channel [0, 4] to after the last balance assert. Currently I am just struggling to run scenarios locally, so can't verify that it still works this way.