Open jeikabu opened 4 years ago
I'm assuming you're using actual Windows and not WSL.
IPC on Windows is implemented using named pipes. I wonder if the OS has some limitation here -- I don't think I'm doing anything particularly asynchronously -- but I will look into it.
I suspect that what is happening is that while I'm calling DisconnectNamedPipe -- you might have clients on the far side that are still holding that point in the "namespace" active. This is different from how it works on UNIX or with TCP. If this is actually what is happening, I'm not sure I'll be able to do much about it.
Docs about this are here: https://docs.microsoft.com/en-us/windows/win32/api/namedpipeapi/nf-namedpipeapi-disconnectnamedpipe
Essentially you have to wait for the clients to see the pipe error, and close the pipe.
This does mean that a malicious client can hold the pipe indefinitely, and prevent future server instances from listening. This is a design defect in Named Pipes in Windows, IMO. (But again, Windows wasn't designed for multi-user use, really, and this API reflects that -- mostly processes are assumed to be cooperative and not actively hostile.)
Yes, this is actual Windows, not WSL. Tried looking into this and this seems to be what's going on:
Under normal circumstances, e.g. pair1_pipe_recv_cb
receives an aio error and calls nni_pipe_close.()
. nni_pipe_close()
kicks off nni_reap(&p->p_reap, (nni_cb) pipe_destroy, p)
and when it gets into ipctran_pipe_fini()
, refcnt
drops to 0. Then nni_listener_reap()
-> nni_listener_destroy()
and ipctran_ep_fini()
cleans everything up.
Things go awry if nng_close()
kicks off nni_reap(&l->l_reap, (nni_cb) nni_listener_reap, l)
before pipe_destroy
. In this case when it gets to ipctran_ep_fini
, refcnt != 0
and things aren't cleaned up. Next iteration around the loop CreateNamedPipe()
fails with EADDRINUSE
.
In the above sample, you can prevent this by delaying nng_close()
with nng_msleep(10)
or whatever.
Your last comment is relevant. I'll have to think hard about properly fixing this safely -- it probably means we need to change the way we tear down the listener -- these pipes probably need to add to the reference count on the listener, so that the listener isn't fully discarded until the last pipe closes.
This is quite unlike POSIX, where the service address (the file name) is only used at rendezvous, and the two sockets can happily stay connected without impacting the ability to delete and create the socket in the file system.
I might not get to this for 1.3, but only because I'm trying hard to get a 1.3 out the door soonish.
Closing both sockets should clean this up properly. I'm still trying to figure out what resource is left behind.
Actually I may have found it... the conn_aio should have been closed too.
I've published what I think might be a fix for this (I can see circumstances where possibly this impacts TCP as well btw.)
Please have a look at my fix1175 branch and give it a shot. It's not easy for me to test Windows right this second (because I don't have any decent toolchain for Windows ARM).
tested the changes proposed in #1562 on Windows 10 x64 - the issue persists
NNG & Platform details.
nng v1.2.4 Win10
Expected Behavior
After calling
nng_close()
on a socket listening on"ipc://"
, can immediately open another socket and listen on same pipe.Actual Behavior
Works on Linux. On Windows,
nng_listen()
fails withEADDRINUSE
after 10-500x iterations.Steps to Reproduce