xtaci / smux

A Stream Multiplexing Library for golang with least memory usage(TDMA)
MIT License
1.27k stars 189 forks source link

Serious keepalive issue #67

Open netvipe opened 4 years ago

netvipe commented 4 years ago

I've migrated my project from Yamux to Smux a couple of weeks ago. Most reasionly due to the memory alloc behavior of Yamux After facing some issue, I thought everything is smooth and stable. But the included keepalive functions seems to be far from that.

A debugging revealed that the notifyBucket/check dataReady approch is not reliable under heavy load and with higher RTT. The async notification works, but dataReady is sometimes not correctly set for the timeout checks. It is also very important to mention that this only happens when testing under real life conditions. My local test-cases are all fine, but testing between remote devices reveals the mentioned issue.

One mitigation would be removing the s.Close() right after the dataReady checks. Another would be replacing the check by a more reliable approach. There would also be the option to disable keepalives and implement a real Ping() function.

I've chosen the last option.

Could you please check if a patch integration makes sense or alternatively provide a fix for the keepalive.

Thanks!

Here is my ping patch: ping_patch.txt

xtaci commented 4 years ago

394 if !atomic.CompareAndSwapInt32(&s.dataReady, 1, 0) {

so if this line fails, there is no incoming data, OR(for extreme case), the goroutine for recvLoop() hasn't been scheduled to ?

xtaci commented 4 years ago

for one special case:

the caller of smux don't read the incoming data, then , it blocks on : 309 for atomic.LoadInt32(&s.bucket) <= 0 && !s.IsClosed() {

xtaci commented 4 years ago

https://github.com/xtaci/smux/commit/80072d9b5c44fe531e0e5d7d36b029b87f20065c

netvipe commented 4 years ago

Your patch loooks like a good option for approach 2!

Any comments on the Ping()?

xtaci commented 4 years ago

Pong is not required, IMHO, 'cause we don't need RTT.

netvipe commented 4 years ago

You may don't need it, but I'm using it and there have been plenty of other requests. In addition, it's nice to have a connection control apart from the non-reliable IsClosed() to managed adjacent processes. But okay. So I'll have to managed my own branch.