relab / gorums

Gorums simplify fault-tolerant quorum-based protocols
MIT License
138 stars 14 forks source link

A potential goroutine leak in gorums /channel.go #189

Closed xuxiaofan1203 closed 1 month ago

xuxiaofan1203 commented 1 month ago

If the number of nodes is zero, the node.close() will not get executed. https://github.com/relab/gorums/blob/e7ce532e8253015f43bcfe63291cdde404288460/mgr.go#L57-L64 which make the n.cancel() can't be called to awaken the <-c.parentCtx.Done() below. https://github.com/relab/gorums/blob/e7ce532e8253015f43bcfe63291cdde404288460/node.go#L105-L107 https://github.com/relab/gorums/blob/e7ce532e8253015f43bcfe63291cdde404288460/channel.go#L217-L224 So the goroutine will block forever and leak in that case. https://github.com/relab/gorums/blob/e7ce532e8253015f43bcfe63291cdde404288460/channel.go#L81

How to Reproduce: You can use goleak to detect the bug in the test function. The number of nodes is zero, the node.close() will not get executed in the case. https://github.com/relab/gorums/blob/e7ce532e8253015f43bcfe63291cdde404288460/channel_test.go#L42 The output of goleak below:

channel_test.go:65: found unexpected goroutines:
        [Goroutine 21 in state select, with github.com/relab/gorums.(*channel).sender on top of the stack:
        github.com/relab/gorums.(*channel).sender(0xc000224dd0)
            /home/song2048/桌面/goProject/src/github.com/system-pclub/GCatch/GCatch/testdata/gorums/channel.go:220 +0xce
        created by github.com/relab/gorums.newChannel in goroutine 20
            /home/song2048/桌面/goProject/src/github.com/system-pclub/GCatch/GCatch/testdata/gorums/channel.go:81 +0x259
meling commented 1 month ago

Thanks for the bug report @xuxiaofan1203. I've looked into this briefly. It is a bit out of context at the moment, so I may have forgotten something.

My initial thinking is that this should not be a problem that would normally arise when using the generated Gorums manager via the NewManager() and NewConfiguration() calls. That is, calling NewConfiguration should fail if you provide an empty node list.

The reason that TestChannelCreation leaks is that we use a dummy manager without any nodes (for simplicity) since I believe the test is meant to test internal stuff (that the node's channel is created and operational). That said, we should not leak goroutines in tests either. The simple fix is to call node.close() manually at the end of the test. (Since the defer mgr.Close() is ineffective because we never told the mgr about the node.)

xuxiaofan1203 commented 1 month ago

Thanks for the bug report @xuxiaofan1203. I've looked into this briefly. It is a bit out of context at the moment, so I may have forgotten something.

My initial thinking is that this should not be a problem that would normally arise when using the generated Gorums manager via the NewManager() and NewConfiguration() calls. That is, calling NewConfiguration should fail if you provide an empty node list.

The reason that TestChannelCreation leaks is that we use a dummy manager without any nodes (for simplicity) since I believe the test is meant to test internal stuff (that the node's channel is created and operational). That said, we should not leak goroutines in tests either. The simple fix is to call node.close() manually at the end of the test. (Since the defer mgr.Close() is ineffective because we never told the mgr about the node.)

I'm happy to help. Yes, node.close() should be called directly in that case to avoid leaking goroutines. The same situation happens in this function https://github.com/relab/gorums/blob/e7ce532e8253015f43bcfe63291cdde404288460/channel_test.go#L112

meling commented 1 month ago

If you want to fix these bugs and add the goleak checker to our tests, I'm happy to accept a PR.

xuxiaofan1203 commented 1 month ago

If you want to fix these bugs and add the goleak checker to our tests, I'm happy to accept a PR.

It's my pleasure. I have submitted a pull request. I used the simple way that call node.close() instead of mgr.Close() in that cases to fix bugs. And I have tested changes locally. There is no goroutine leak now.