well-typed / grapesy

Native Haskell gRPC client and server based on `http2`
Other
31 stars 4 forks source link

Disable HTTP-level timeouts #124

Closed edsko closed 2 months ago

edsko commented 2 months ago

We can still see TimeoutThread exceptions if a server is terminated before our worker gets a chance to install its own exception handler. This is not a big deal, but can lead to some confusing error messages for example in the interop tests: if a test fails, and the server is terminated very quickly, then this exception may be shown (we could hide it by calling setUncaughtExceptionHandler).

edsko commented 2 months ago

My analysis of where the TimeoutThread is coming from.

Control.Reaper

https://hackage.haskell.org/package/auto-update-0.1.6/docs/Control-Reaper.html

General purpose resource manager/cache: maintains a collection of resources, and periodically calls a user-specified "prune" action to see which of those resources should be kept and which should be removed from the collection.

System.TimeManager

https://hackage.haskell.org/package/time-manager-0.0.1/docs/System-TimeManager.html

Instance of the Reaper, where we are managing list of Handle:

data Handle = Handle !(IORef TimeoutAction) !(IORef State)
type TimeoutAction = IO ()
data State = Active | Inactive | Paused | Canceled

One way to allocate a Handle is by calling registerKillThread; this will kill the thread that it is called from (by throwing TimeoutThread to it) when the Handle is pruned (after calling some user-specified action, which in our case is not used).

Network.Run.TCP.Timeout

runTCPServerWithSocket gets a timeout parameter tm which it passes to the timeout argument of withManager. The TimeoutServer itself, as well as a Handle, are passed as argument to the TimeoutServer callback.

Each newly spawned handler for incoming request gets registered with the TimeManager by calling registerKillThread, and the Handle is passed to the handler.

Network.HTTP2.TLS.Server

To run the TLS enabled server, grapesy calls run.

run :: Settings -> Credentials -> HostName -> PortNumber -> Server -> IO ()
run settings creds host port server =
    runTLS settings creds host port "h2" $ run' settings server

Inlining runTLS and run'

runTCPServerWithSocket .. (settingsTimeout settings) .. $ \mgr th sock -> do
    ..
    E.bracket (contextNew ..) bye $ \ctx -> do
        ..
        iobackend <- timeoutIOBackend th settings <$> tlsIOBackend ctx sock
        E.bracket
            (allocConfigForServer settings mgr send recv mySockAddr peerSockAddr)
            freeConfigForServer
            (\conf -> H2Server.run sconf conf server)

The mgr is part of the config (confTimeoutManager), so it can in principle be accessed through the Config (from http2). The Handle for each request is used to construct the IOBackend (in timeoutIOBackend). Every send/sendMany/recv calls tickle on the Handle, which sets its to Active. The Handle itself is not exposed in the IOBackend interface.

Network.HTTP2.Server.Run

There is another layer of indirection. Above, runTCPServerWithSocket spawns a new server thread for each TCP connection, and then runs Network.HTTP2.Server.Run.run. This sets up a manager of its own, to manager "workers". It then processes all HTTP2 frames (runReceiver / runSender), passing frames to the corresponding worker thread.

This means that the worker thread that grapesy provides in the end is not the thread that the TimeManager throws the TimeoutThread to. Instead, that exception is caught by the main thread from http2 (running runH2), which then closes all streams.

Confusingly, there is a secondary source of these thread timeouts: the workers spawned by http2 (see Network.HTTP2.Server.Worker.worker). These get registered with the same timeout (and there is logic in http2 to tickle the corresponding Handlers whenever data is sent or received).

Conclusion

We can change settingsTimeout (in seconds) to be something really high to avoid timeouts happening, but we cannot avoid the timeout being sent when the server is terminated.

"Something really high" unfortunately is a bit awkward; we cannot use maxBound, because runTCPServerWithSocket multiplies the value by 1,000,000 in order to get a value in microseconds. (Ultimately this value becomes the reaperDelay which gets passed as an argument to threadDelay. So a different interpretation of the Int delay is not very easy to do, as this package sits quite far down the dependency stack.)

While we can ignore the TimeoutThread sent to the worker, we cannot prevent the streams from being closed. We therefore really need a way to disable timeouts.

The non-TLS case

In a way this case is even worse, as now the timeout is not even an argument: we call allocSimpleConfig, which hard-codes the delay to 30 seconds. Fortunately we don't have to call this function, but we do still have the same we-must-specify-a-day problem.