Closed yangby-cryptape closed 3 years ago
Unable to open protocol randomly, especially when the client only has limited CPU resources.
This issue was introduced since PR 288: fix: fix some msg left on buffer. https://github.com/nervosnetwork/tentacle/blob/0c5c1e43f692f1e7c053d2e5aa10af35a7d40dfd/yamux/src/session.rs#L81-L87
[1] When client create a Session and fetch messages from it, the follow code was running:
[1]
Session
https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/session.rs#L627-L628
[2] When client want to open a protocol, the follow code was running concurrently with [1]:
[2]
https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/tentacle/src/session.rs#L321-L331
We step into those functions, we could found there are two frames would be sent:
[3] First: https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/stream.rs#L177-L180
[3]
[4] Second: https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/tentacle/src/protocol_select/mod.rs#L155 https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/stream.rs#L132
[4]
The above two pieces of code are running concurrently.
If line 627 of [1] is done before executing line 178 in [3] and line 115 in [4] is done before executing line 628 of [1].
line 627 of [1]
line 178 in [3]
115 in [4]
line 628 of [1]
The client will send the Data frame before WindowUpdate frame.
Data
WindowUpdate
So the server will drop Data frame without do anything. https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/session.rs#L430-L433
Then the server will open the protocol with WindowUpdate frame: https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/session.rs#L394
But the server will never reply to the client. https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/tentacle/src/protocol_select/mod.rs#L157-L163
So, at last, a ProtocolSelectError(Elapsed()) will be thrown.
ProtocolSelectError(Elapsed())
::std::thread::sleep(::std::time::Duration::from_millis(300));
line 627 in [1]
::tokio::time::delay_for(::tokio::time::Duration::from_millis(100)).await;
line 328 in [2]
Run CKB Integration Tests Spec PoolReconcile.
PoolReconcile
The follow code will be panicked frequently.
info!("Connect node0 to node1"); node0.connect(node1); waiting_for_sync(nodes);
Description
Unable to open protocol randomly, especially when the client only has limited CPU resources.
This issue was introduced since PR 288: fix: fix some msg left on buffer. https://github.com/nervosnetwork/tentacle/blob/0c5c1e43f692f1e7c053d2e5aa10af35a7d40dfd/yamux/src/session.rs#L81-L87
Reason
[1]
When client create aSession
and fetch messages from it, the follow code was running:https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/session.rs#L627-L628
[2]
When client want to open a protocol, the follow code was running concurrently with[1]
:https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/tentacle/src/session.rs#L321-L331
We step into those functions, we could found there are two frames would be sent:
[3]
First: https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/stream.rs#L177-L180[4]
Second: https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/tentacle/src/protocol_select/mod.rs#L155 https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/stream.rs#L132The above two pieces of code are running concurrently.
If
line 627 of [1]
is done before executingline 178 in [3]
and line115 in [4]
is done before executingline 628 of [1]
.The client will send the
Data
frame beforeWindowUpdate
frame.So the server will drop
Data
frame without do anything. https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/session.rs#L430-L433Then the server will open the protocol with
WindowUpdate
frame: https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/yamux/src/session.rs#L394But the server will never reply to the client. https://github.com/nervosnetwork/tentacle/blob/56c64b68534e7556236dc3a76381e79529ce8aff/tentacle/src/protocol_select/mod.rs#L157-L163
So, at last, a
ProtocolSelectError(Elapsed())
will be thrown.Reproduce
::std::thread::sleep(::std::time::Duration::from_millis(300));
afterline 627 in [1]
.::tokio::time::delay_for(::tokio::time::Duration::from_millis(100)).await;
afterline 328 in [2]
.Run CKB Integration Tests Spec
PoolReconcile
.The follow code will be panicked frequently.