raiden-network / light-client

The Raiden Light Client
https://raiden.network/
MIT License
33 stars 31 forks source link

Withdraw not working after crashing the task a few times during withdraw #2177

Closed christianbrb closed 3 years ago

christianbrb commented 3 years ago

Thanks for filing a bug report :-)

Steps to Reproduce

  1. Navigate to https://lightclient.raiden.network/staging
  2. Make a withdraw
  3. Kill the browser task during withdraw
  4. Make this a few times
  5. Make a withdraw without killing the browser task

Expected Result

Actual Result

[wait one night] -> Result from #2184 for completness

Additional Infos (Optional)

- Firefox 80.0.1
christianbrb commented 3 years ago

Maybe an issue with the Raiden hub client? @weilbith Could you please upload the logs of the Raiden hub of today? Thanks :)

weilbith commented 3 years ago

Here we go. Don't get confused by the date in the file name. Paul enlightened me how the log rotating works for the Python client. ^^ raiden-debug_2020-09-27T12:00:44.641239.log

christianbrb commented 3 years ago

@weilbith Thanks :) I don't see an error within the logs.

@andrevmatos Could you please check the logs? I have retried and it is still not working.

andrevmatos commented 3 years ago

Ok, I did read the logs from our side, but couldn't get anything on hub's side. Looks like, while the events on the LC end at ~7h11am, the logs on the hub start at 9h05am. @weilbith can you check to get the logs from the proper time range from the hub, please? From our side, what I could see that happened, is expected:

Again, this doesn't look like an issue with persistence, and instead a protocol issue because we're more relaxed than raiden-py on constraints on handling and supporting parallel withdraws. Possible solutions include retrying pending withdraws for which we've received confirmations for, erroring on new withdraw request before succeeding or expiring previous one, or fixing raiden-py.

weilbith commented 3 years ago

@andrevmatos to make this more specific: you need the logs from the 27th of September 2020 including the messages before and after 7am? If so we have a problem: there are no log entries for that time. Doing a recursive grep with the format of the timestamp used in he logs I get no results (i.e. grep -R "2020-09-27 07:" ./). I tried this approach because the naming of the log files is confusing to me.

andrevmatos commented 3 years ago

@weilbith the logs on the LC side goes from 2020-09-28T06:44:14 to 2020-09-28T07:11:21. But anyway, it's very weird not to have logs available. Is there any kind of log rotation available? Isn't the hub running 24/7?

christianbrb commented 3 years ago

@andrevmatos I hope, that I haven't confused you. There two logs of the dApp for the two issues. I have attached the logs of the second issue to the description of this issue now. No update for the Hub from my side.

andrevmatos commented 3 years ago

No confusion, I'm checking the logs on this issue, and there's no time intersection between the logs from the dApp and the hub. Thore said there're no logs on the hub for this time frame, which is very weird, it's harder to debug a raiden-py issue without logs.

weilbith commented 3 years ago

@andrevmatos if you could give me the fingerprint of a SSH key you control, I could give you access to the hub and point you to the logs directory. I think that is more efficient.

palango commented 3 years ago

@andrevmatos Sorry for missing the comment. I'm looking into it and will update you as soon as I have looked into the python client

christianbrb commented 3 years ago

@palango to create an issue on the Raiden python repo to reproduce the issue in an integration test.

palango commented 3 years ago

@christianbrb Do you remember how big the withdraws were in relation to the channel balance and deposits?

palango commented 3 years ago

Issue in Python client: https://github.com/raiden-network/raiden/issues/6599

christianbrb commented 3 years ago

@palango The withdraw was 0.0001 TTT and the prior SetTotalDeposit was the same 0.0001 TTT.

I've analyzed this by looking at the transactions: https://goerli.etherscan.io/address/0xb808877b5ec745183bd60ce8c89aba83011d56f6

palango commented 3 years ago

Mh, that explains it. @andrevmatos

@andrevmatos I don't see a problem in the python client here. How could this be handles differently?

andrevmatos commented 3 years ago

The problem IMO is that it shouldn't deduce pending offchain_total_withdraw from the deposits to calculate the balance: total_withdraw is a monotonic value; from what I can see, PC is including current offchain_total_withdraw in the value which it calculates as potential withdraw for partner; if I had a deposit - transferred of 5T (balance), a offchain_total_withdraw of 3T, I should be able request a new offchain_total_withdraw of 4T, since it's still <= 5, and this new request should supersede the previous of 3T.

andrevmatos commented 3 years ago

For this, balance and totalWithdrawable should be different values (and this is how we calculate it on the LC):

It's a sensible difference, but seems to be the issue here.

palango commented 3 years ago

That makes sense, so the PT should just sign the request again, because it doesn't do any harm. (as long as the partner hasn't lost so much balance that he cannot do it any more, but then PT shouldn't have accepted the payment before)

andrevmatos commented 3 years ago

Yes, exactly. As long as the request doesn't "steal" from PC, and this new pending request is accounted for future transfers, PC should accept and sign this request.

christianbrb commented 3 years ago

@andrevmatos @palango Thanks for the great joint analysis :)

I am closing the issue as it is getting solved within the python client (see raiden-network/raiden#6599)