Open jncornett opened 4 years ago
I don't have the context to know if this makes sense for all use cases, but a simple fix would be to duplicate nl.NetlinkSocket.Receive
to a new method nl.NetlinkSocket.ReceiveWithTimeout
. Inside of this new method set a userspace timeout via unix.Select(...)
immediately prior to the call to unix.Recvfrom(...)
. The *Subscribe*
family of functions could use this new ReceiveWithTimeout
method in place of Receive
. If the return value of the error is Timeout
(I know, checking error return values is anti-pattern), then the loop would just continue
. As far as I know, this seems like the least invasive change to "cancel" a recvfrom()
/recvmsg()
call.
unix.RecvFrom()
can be unblocked by calling unix.Shutdown()
with unix.SHUT_RD
(or unix.SHUT_RDWR
if you also want to unblock unix.SendTo()
).
I didn't test, but I think this patch would help:
diff --git a/nl/nl_linux.go b/nl/nl_linux.go
index 600b942b1785..72104b93ef2b 100644
--- a/nl/nl_linux.go
+++ b/nl/nl_linux.go
@@ -734,6 +734,7 @@ func SubscribeAt(newNs, curNs netns.NsHandle, protocol int, groups ...uint) (*Ne
func (s *NetlinkSocket) Close() {
fd := int(atomic.SwapInt32(&s.fd, -1))
+ unix.Shutdown(fd, unix.SHUT_RDWR)
unix.Close(fd)
}
In fact, it seems this is not even needed on recent kernel anymore. Just close(done)
seems to do the trick.
I noticed the same issue on Fedora 35 with kernel 5.18.6-100.fc35.x86_64. Shouldn't that be recent enough?
SHUT_RDWR
netlink return operation not supported
Contingent on how long it takes to actually start reading from the socket fd, the following code will never return:
This is apparently because:
s.Receive()
1.This isn't a big deal for short-lived programs (as the goroutines will get cleaned up on exit), but I was writing a long lived daemon that creates and removes subscriptions based on configuration file updates. With the current behavior, it is not possible to write such a program without leaking goroutines via the above mechanism.