microsoft / go-winio

Win32 IO-related utilities for Go
MIT License
953 stars 181 forks source link

pipe: add server backlog for concurrent Accept() #291

Open jeffhostetler opened 1 year ago

jeffhostetler commented 1 year ago

Teach pipe.go:ListenPipe() to create multiple instances of the server pipe in the kernel so that client connections are less likely to receive a windows.ERROR_PIPE_BUSY error. This is conceptually similar to the backlog argument of the Unix listen(2) function.

The current listenerRoutine() function works sequentially in response to calls to Accept(), such that there will only be at most one unbound server pipe in the NPFS present at any time. Even if the server application calls Accept() concurrently from a pool of application threads, the listenerRoutine() will process them sequentially.

In this model and because there is only one listenerRoutine() instance, there is an interval of time (immediately after a connection is made) where there are no available unbound/free server pipes. When ConnectNamedPipe() returns, listenerRoutine() sends the new pipe handle over a channel to the caller of Accept(). The application code then has an opportunity to dispatch/process it and then call Accept() again. Only at that point can listenerRoutine() create a new unbound server pipe in the file system and wait for the next connection. Anytime during this interval, a client application trying to connect will get a pipe busy error.

Code in DialPipe() hides this from GOLANG callers because it includes a busy retry loop. However, clients written in other languages without this assistance are likely to see the busy error and be forced to deal with it.

This change introduces an "accept queue" using a buffered channel and splits listenerRoutine() into a pool of listener worker threads. Each worker creates a new unbound pipe in the file system and waits for a client connection. The NPFS and kernel can then deliver the new connection to a random listener worker. The resulting connected pipe is delivered back to the caller Accept() as before.

A PipeConfig.QueueSize variable controls the number of listener worker threads and the maximum number of unbound/free pipes server pipes that will be present at any given time. Note that a listener worker will normally have an unbound/free pipe except during that same delivery interval. Having multiple active workers (and unbound pipes in the file system) gives us extra capacity to handle rapidly arriving connections and minimize the odds of a client seeing a busy error.

The server application is encouraged to call Accept() from a pool of application workers. The size of the application pool should be the same or larger than the queue size to take full advantage of the listener queue.

To preserve backwards compatibility, a queue size of 0 or 1 will behave as before.

Also for backwards compatibility, listener workers are required to wait for an Accept() call so that the worker has a return channel to send the connected pipe and error code. This implies that the number of unbound pipes will be the smaller of the queue size and the application pool size.

Finally, a Mutex was added to l.Close() to ensure that concurrent threads do not simultaneously try to shutdown the pipe.

jeffhostetler commented 1 year ago

I observed this problem while trying to send data from git.exe to a GOLANG server. The code calling CreateFile() saw the busy error (or some other error) and didn't expect to need to spin.

I did a little (incomplete) search of the issue backlog and found a few that it might be related:

jeffhostetler commented 1 year ago

I added a test at the bottom of pipe_test.go that runs various geometries and shows the observed OK-vs-busy rate, but I wasn't sure if/how/when we wanted to throw an error. On my laptop, the legacy cases get busy errors about 33% of the time. With a moderate queue size, we don't get busy signals -- however they are still theoretically possible, I just didn't see any on my limited tests, so I hesitated asserting it.

Suggestions welcomed. Thanks!

jeffhostetler commented 1 year ago

@microsoft/containerplat @msscotb @kevpar @helsaawy Hey, just a quick ping. I was wondering if anyone had had a chance to look at my PR and see if this functionality is of interest. (I just noticed that there is now a conflict with a recently merged change. I'll address that shortly.)

Thanks!