Open jeffhostetler opened 1 year ago
I observed this problem while trying to send data from git.exe
to a GOLANG server. The code calling CreateFile()
saw the busy error (or some other error) and didn't expect to need to spin.
I did a little (incomplete) search of the issue backlog and found a few that it might be related:
I added a test at the bottom of pipe_test.go
that runs various geometries and shows the observed OK-vs-busy rate, but I wasn't sure if/how/when we wanted to throw an error. On my laptop, the legacy cases get busy errors about 33% of the time. With a moderate queue size, we don't get busy signals -- however they are still theoretically possible, I just didn't see any on my limited tests, so I hesitated asserting it.
Suggestions welcomed. Thanks!
@microsoft/containerplat @msscotb @kevpar @helsaawy Hey, just a quick ping. I was wondering if anyone had had a chance to look at my PR and see if this functionality is of interest. (I just noticed that there is now a conflict with a recently merged change. I'll address that shortly.)
Thanks!
Teach
pipe.go:ListenPipe()
to create multiple instances of the server pipe in the kernel so that client connections are less likely to receive awindows.ERROR_PIPE_BUSY
error. This is conceptually similar to thebacklog
argument of the Unixlisten(2)
function.The current
listenerRoutine()
function works sequentially in response to calls toAccept()
, such that there will only be at most one unbound server pipe in the NPFS present at any time. Even if the server application callsAccept()
concurrently from a pool of application threads, thelistenerRoutine()
will process them sequentially.In this model and because there is only one
listenerRoutine()
instance, there is an interval of time (immediately after a connection is made) where there are no available unbound/free server pipes. WhenConnectNamedPipe()
returns,listenerRoutine()
sends the new pipe handle over a channel to the caller ofAccept()
. The application code then has an opportunity to dispatch/process it and then callAccept()
again. Only at that point canlistenerRoutine()
create a new unbound server pipe in the file system and wait for the next connection. Anytime during this interval, a client application trying to connect will get a pipe busy error.Code in
DialPipe()
hides this from GOLANG callers because it includes a busy retry loop. However, clients written in other languages without this assistance are likely to see the busy error and be forced to deal with it.This change introduces an "accept queue" using a buffered channel and splits
listenerRoutine()
into a pool of listener worker threads. Each worker creates a new unbound pipe in the file system and waits for a client connection. The NPFS and kernel can then deliver the new connection to a random listener worker. The resulting connected pipe is delivered back to the callerAccept()
as before.A
PipeConfig.QueueSize
variable controls the number of listener worker threads and the maximum number of unbound/free pipes server pipes that will be present at any given time. Note that a listener worker will normally have an unbound/free pipe except during that same delivery interval. Having multiple active workers (and unbound pipes in the file system) gives us extra capacity to handle rapidly arriving connections and minimize the odds of a client seeing a busy error.The server application is encouraged to call
Accept()
from a pool of application workers. The size of the application pool should be the same or larger than the queue size to take full advantage of the listener queue.To preserve backwards compatibility, a queue size of 0 or 1 will behave as before.
Also for backwards compatibility, listener workers are required to wait for an
Accept()
call so that the worker has a return channel to send the connected pipe and error code. This implies that the number of unbound pipes will be the smaller of the queue size and the application pool size.Finally, a Mutex was added to
l.Close()
to ensure that concurrent threads do not simultaneously try to shutdown the pipe.