nervosnetwork / fiber

15 stars 8 forks source link

Fix the repeated shutdown message issue #113

Closed chenyukang closed 2 months ago

chenyukang commented 2 months ago

In the e2e test reestablish, there is an extra shutdown command if auto shutdown happened.

Suppose A send shutdown command, A will send a peer message to B, and if B auto accept shutdown, B will reply a shutdown to A immediately.

Now A is already in state ShuttingDown, if A received another shutdown command, based on the current code, it will continue to send shutdown peer message to B, which will cause state inconsistent and there is an error Musig2VerifyError(BadSignature).

In this fix, A will ignore the extra shutdown command when it's already in ShuttingDown state.

chenyukang commented 2 months ago

Another issue: when error happened for the request tests/bruno/e2e/reestablish/17-remove-tlc-from-NODE1.bru, there is no error message in RPC response, which may hide some errors in code.

contrun commented 2 months ago

The error Musig2VerifyError(BadSignature) here is related to https://github.com/nervosnetwork/cfn-node/issues/111 , where we have the same error when two nodes simultaneously send Shutdown message to each other. Even if node A sends node B a duplicated Shutdown message after B sends A a Shutdown message, we shouldn't have the Musig2VerifyError(BadSignature) error. We should also fix that.

chenyukang commented 2 months ago

The Musig2VerifyError(BadSignature) error happened in this way:

image

The root cause is the two sides don't have the same shutdown scripts, after the second shutdown to peer B:

A holds (close_script_a, source_lock_script_b) B holds (close_script_a, close_script_b)

So they build different shutdown_tx, and the signature verification failed for both sides.

I think we should keep the principle that we only response the first shutdown command (or auto shutdown accept), so the second shutdown_command should be ignored. For the peer message, we already keep this way, at code:

https://github.com/nervosnetwork/cfn-node/blob/d18cd69f8549a031bdc06076bc1d80d229800a7f/src/ckb/channel.rs#L372

With the current fix in this PR, we don't need any more fixes now.

contrun commented 2 months ago

I think we should keep the principle that we only response the first shutdown command (or auto shutdown accept), so the second shutdown_command should be ignored.

I think there are cases when we can't consistently determine whether it is the shutdown command or the auto shutdown accept that happens first.

Imagine, A and B both received ShutdownCommand from the client. Now both of they send a Shutdown message to their peer. But the problem now is that A thought his Shutdown message to B happens before B's Shutdown message to him. In the same vein, B thought his Shutdown message to A happens before A's Shutdown message to him. Thus, A and B have inconsistent view on which shutdown scripts to use.

The problem here is that we need a way to break tie when truly concurrent Shutdown events happen. I suggest that we build closing transaction from the Shutdown message with higher fee rate.