Open dora-gt opened 4 years ago
memo
thread 'tokio-runtime-worker-11' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `2`', crates/ilp-node/tests/three_nodes.rs:221:33
stack backtrace:
I'm not sure why the packet said it timed out but STREAM should retry on R00 errors https://github.com/interledger-rs/interledger-rs/issues/467
Err: () ... 😠https://circleci.com/gh/interledger-rs/interledger-rs/2720
---- two_nodes_btp stdout ----
Error executing tests: ()
thread 'two_nodes_btp' panicked at 'called `Result::unwrap()` on an `Err` value: ()', src/libcore/result.rs:1084:5
@dora-gt that's that same R00 error:
[DEBUG interledger_stream::client] Prepare 1 with amount 2000 was rejected with code: R00 (2000 left to send)
Ahhhh I see but it seems strange enough to take about 30secs to send a payment even if it should retry then. I'm wondering why.
[INFO warp::filters::log] 127.0.0.1:39062 "POST /accounts/bob_on_b/payments HTTP/1.1" 500 "-" "reqwest/0.9.22" 27.019914015s
@dora-gt can this be closed now?
Yes, I'm closing.
I'm wondering whether the three_nodes
test is failing because of the same reason. Let's see if it is solved as well. I'll open another issue if that still happens.
I'm reopening this, I'm not sure what is the cause but it is failing again.
The reason was that RedisStore::insert_account
was not fully atomic.
The code to reproduce the bug is here: https://github.com/interledger-rs/interledger-rs/commit/a3917726b12a31648c192d28c703ab5c1dce4d58
If the accounts of alice_on_a
and b_on_a
are being added at the same time and unfortunately, b_on_a
is added when alice_on_a
is being added, alice_on_a
is added with a wrong ILP address.
Imagine that, this line is executed: https://github.com/interledger-rs/interledger-rs/blob/two_nodes_btp-reproduce/crates/interledger-store-redis/src/store.rs#L979
and then, b_on_a
is added at the same time, which means the procedure of b_on_a
updates the ILP address of the node. After that, the procedure of alice_on_a
executes this line:
https://github.com/interledger-rs/interledger-rs/blob/two_nodes_btp-reproduce/crates/interledger-store-redis/src/store.rs#L994
Then the ILP address of alice_on_a
becomes local.host.alice_on_a
, which should be example.parent.a_on_b.alice_on_a
, and the payments from Node B to Node A fail.
I think we have to make RedisStore::insert_account
fully atomic. For example, lock the ILP address first, and execute redis_insert_account
, and after that, unlock the ILP address.
Actually I tried some approaches but it was really difficult because RwLock
cannot be used in multi-thread executor like tokio
(am I right?), and also futures
0.1.29 doesn't provide nice locking library.
Does anyone come up with a nice way to solve this?
@interledger-rs/contributors
Also I think it would be nice to investigate other functions around redis about whether it is fully atomic or not, being careful about deadlocks.
I have no trouble believing there are race conditions in this code (I had previously noted having seen some in https://github.com/interledger-rs/interledger-rs/issues/196#issuecomment-523664862 , and I think I may have left some comments in the redis store remarking at plausible race conditions). At the ones I was looking at it seemed like the difficulty was that we may have needed to move things into Lua scripts in order to achieve atomicity, which is kind of a pain. Going forward my work on the SQLite store will allow us to have a backend with proper transactions.
Thanks Ben, yes, I agree that one of possible countermeasures is moving the variable into stores that could ensure the atomicity.
I think this problem is very profound. I mean, I think we should not update the ILP address of the node during any payment processes. For example, what happens if the ILP address is updated when some Stream payments are being processed? Maybe this is not tested and might cause some undesired behavior. We should start locking the ILP address when we get Prepare
s till we send Fulfill
s at least the read
level... though that might be a very rare case.
After exploring some functions on futures (tokio::sync::lock, futures_locks), experimenting those and looking into the implementation, I finally became to think about:
RwLock
or Mutex
because it is not safe in the Async context (mentioned here).ilp_address
variable in RedisStore and lock the ILP address on any Stores including Redis, SQL because we have to make it scalable and then locking the ILP address on a single node doesn't make sense.
SELECT xx FROM xx LOCK IN SHARE MODE
or something like it.So I think I should revisit this problem later after SQL store is implemented by @bstrie . I'll fix just not to let tests fail though it is not an essential solution.
JFYI: What is interesting regarding futures_locks
is that they provide async RwLock
. tokio::sync::lock
doesn't provide RwLock though there are more useful libraries in the current async environment.
Can this be closed now?
I think we have to essentially fix this problem, later after SQL is integrated. So this should not be closed IMO.
It seems that
two_nodes_btp
test is unstable.When it failed, I just reran. https://circleci.com/gh/interledger-rs/interledger-rs/2641 Then it succeeded. https://circleci.com/gh/interledger-rs/interledger-rs/2642
It might be worth changing `expect("xxxx")' to investigate a bit more.