guzba / mummy

An HTTP and WebSocket server for Nim that returns to the ancient ways of threads.
MIT License
274 stars 10 forks source link

Decoupling server init from the main loop #24

Closed oakes closed 1 year ago

oakes commented 1 year ago

I'm working on a project with mummy and it will involve writing many unit tests that create a thread where the mummy server will be run, doing some requests, and then closing the server. This can lead to a race condition where the requests and subsequent close are called while the server is still initializing.

Normally, when making my own custom server, I resolve this by making the server send a value down a channel after it has initialized and before it enters the loop. That way, the main thread can just block until it receives the value from the channel, and then continue the unit test afterwards.

I can't do this with mummy because serve combines initialization with the loop. For example, if you run this:

proc handler(request: Request) =
  discard
let server = newServer(handler)

var serverThread: Thread[void]

proc serverProc() =
  {.cast(gcsafe).}:
    server.serve(Port(8081))

createThread(serverThread, serverProc)
server.close()

Every once in a while, it will crash:

Traceback (most recent call last)
/Users/sekao/Documents/mummy/tests/test.nim(40) serverProc
/Users/sekao/Documents/mummy/src/mummy.nim(1406) serve
/Users/sekao/Documents/mummy/src/mummy.nim(1110) destroy
/Users/sekao/.choosenim/toolchains/nim-1.6.10/lib/system/orc.nim(494) nimDecRefIsLastCyclicStatic
/Users/sekao/.choosenim/toolchains/nim-1.6.10/lib/system/orc.nim(466) rememberCycle
/Users/sekao/.choosenim/toolchains/nim-1.6.10/lib/system/orc.nim(145) unregisterCycle
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

The way I'd fix this is to just create separate init and loop procs which serve will then call, thus preserving backwards compatibility for existing users:

diff --git a/src/mummy.nim b/src/mummy.nim
index f6900ac..f9e85f8 100644
--- a/src/mummy.nim
+++ b/src/mummy.nim
@@ -1358,17 +1358,11 @@ proc close*(server: Server) {.raises: [], gcsafe.} =
   else:
     server.destroy(true)

-proc serve*(
+proc init*(
   server: Server,
   port: Port,
   address = "localhost"
 ) {.raises: [MummyError].} =
-  ## The server will serve on the address and port. The default address is
-  ## localhost. Use "0.0.0.0" to make the server externally accessible (with
-  ## caution).
-  ## This call does not return unless server.close() is called from another
-  ## thread.
-
   if server.socket.int != 0:
     raise newException(MummyError, "Server already has a socket")

@@ -1406,6 +1400,7 @@ proc serve*(
     server.destroy(true)
     raise currentExceptionAsMummyError()

+proc loop*(server: Server) {.raises: [MummyError].} =
   try:
     server.loopForever()
   except:
@@ -1414,6 +1409,20 @@ proc serve*(
     server.destroy(false)
     raise currentExceptionAsMummyError()

+proc serve*(
+  server: Server,
+  port: Port,
+  address = "localhost"
+) {.raises: [MummyError].} =
+  ## The server will serve on the address and port. The default address is
+  ## localhost. Use "0.0.0.0" to make the server externally accessible (with
+  ## caution).
+  ## This call does not return unless server.close() is called from another
+  ## thread.
+  server.init(port, address)
+  server.loop()

I can make a PR if you're interested. If not then it's all good. Thanks for making this library.

guzba commented 1 year ago

Hello and thanks for the high quality report here. I understand the issue.

My present concern with this approach to a solution is it would enable init to be called on a different thread from loop, or make it possible to call init and then serve. I also currently prefer the thread visibility assumptions I can make with just newServer and serve.

Even so, I do recognize the testing issue. I think the true need is a concrete way of knowing requests sent to the server are not happening before the serve is ready. I currently just sleep for a little bit and its never been an issue but I have been aware of the potential flakiness.

Honestly, another not terrible idea is to just hit make a tiny proc something like checkServer and it just loops making a GET to /test or something and once that returns 200 it enables tests to continue.

In general I strongly value testing but I am not a fan of letting testing implementation details dictate public API if avoidable. And I do see it as avoidable here.

I'll need to experiment and think about this.

oakes commented 1 year ago

Fair points. What about just allowing an optional callback to be passed to serve that runs after init? An argument like initComplete: proc () = nil maybe? That way, the invalid init issue can't occur and the affect on the public API is minimal.

guzba commented 1 year ago

This is a reasonable suggestion. It has cross-thread orchestration needs to make it work (the handler would be called from the serve thread) but is viable and less invasive.

Another aspect to this I think should be considered is ensuring a test fails if something goes wrong instead of just deadlocking indefinitely. Something like "block until I get a signal" is risky since if that signal never arrives the test takes forever to indicate failure. This can be pretty frustrating and I have experienced it in the past. Some capacity for timeout should be considered.

The suggestion of a callback does enable that, as the thread being blocked until it gets a signal could simply only wait so long (say 10 seconds).

guzba commented 1 year ago

This is an idea I had, if you have thoughts or concerns:

proc waitUntilReady*(server: Server, timeout: float) =
  ## This proc blocks until the server is ready to receive requests or the timeout has passed.
  ## The timeout is in floating point seconds.
  ## This is intended for use when writing tests, where you need to know
  ## the server is ready before you begin sending requests.
  ## If the server is already ready this returns immediately.

The thread sending requests would call server.waitUntilReady(10) and then once that proc returns it can start sending requests. If the timeout passes instead, an exception would be raised.

This needs no changes to the existing API, is simpler than a callback and enables reliable testing without depending on sleep intervals that are both flaky and unnecessarily slow.

oakes commented 1 year ago

Yeah I like that a lot. It's also much clearer about its purpose than a generic callback would be.

guzba commented 1 year ago

I put together a quick implementation and switched my own tests to use this instead of sleep here: https://github.com/guzba/mummy/pull/25

I'll add a version and tag this today yet, lmk if you have any feedback on PR.

(The sleep interval inside waitUntilReady implementation may go away, but I wanted to get something working that included timeout which wait and broadcast from std/locks does not include.)

oakes commented 1 year ago

Just tried it in my project and it works perfectly, no issues on my end.

guzba commented 1 year ago

Great to hear. I'll have this tagged as 0.2.4 in a few mins.

oakes commented 1 year ago

Fantastic, thank you!