karlseguin / http.zig

An HTTP/1.1 server for zig
MIT License
454 stars 31 forks source link

non-blocking IO + SSE == something locking up #33

Closed zigster64 closed 6 months ago

zigster64 commented 6 months ago

Just getting zig-zag-zoe SSE demo up to date with latest 0.12.dev changes

I noticed that running the SSE code against non-blocking master creates a condition that locks up the app, and requires a restart.

Works fine on latest blocking branch though.

Have not put in any effort to debug this yet, but just a heads up that I will be doing some exploration in this area soonish, and will try to put a patch together if I find anything

karlseguin commented 6 months ago

I know it doesn't practically make a difference, but are you able to tell if the app is actually frozen, or does it just refuse to accept new connections?

For your use-case, the non-blocking master might be more blocking. The application's handler (your events function), is executed on the worker, not in a separate thread. So if you have 2 httpz workers (default), and they are both looping in events, you won't be able to accept or process other requests.

That might sound dumb, but the main goal was to help provide protection against slow/misbehaving/malicious clients. In the blocking branch, someone can connect and very slowly feed it bytes, blocking that worker from accepting new connections. In master, this isn't possible..the only thing that should be able to "block" is the application code. Not great, but more robust to attacks and control given to the application developers.

A mix of the two would be best: processing request/responses in a nonblocking manner, and using a threadpool to execute the application's handler once a complete request was parsed. But it seemed complicated to implement, largely because the connection has to be given back to the worker after the response is written (for keepalive).

The websocket integration solved this by (a) launching a new thread to handle the socket and (b) having http.zig disown the connection.

If this is the issue that you're running into, you should be able to do this in your code, something like:

fn events(....) !void {
  // this is already a public method
  res.disown();
  var stream = try res.startEventStream();
  const thread = try std.Thread.spawn(.{}, eventsLoop, .{stream});
  thread.detach();
}

And if that works, and you agree that starting an eventStream in a new thread and removing the socket from the responsibility of http.zig makes sense, we could bake that into the API. Something like:

fn events(....) !void {
  try res.startEventStream(functionToLaunchInANewThread);
}

fn functionToLaunchInANewThread(stream: std.net.Stream) void {
}

Probably with some arbitrary context you could pass.

zigster64 commented 6 months ago

That sounds about right, thanks.

Ah - disown() .. that's good. I was wondering how on earth you got websocket working with the non-blocking branch without rewriting the websocket IO, and completely missed the new disowned flag on this side. That makes sense now !

So having applied that logic above - setup the SSE handler, then hand it off to its own thread. That works for me. A quick test shows that the app works fine with the blocking branch now.

Might make sense to adjust the startEventStream() API as you mentioned. The thread function definitely needs to take both a Context, and a Stream to be useful. If the user doesnt need a context, it would be idiomatic to just pass .{} then

ie .. in my case Im using the "routes are methods of a struct" approach, so passing a *Self as the context does the job

fn events(self: *Self, req: *httpz.Request, res: *httpz.Response) !void {
    res.disown();
    const stream = try res.startEventStream();
    const thread = try std.Thread.spawn(.{}, eventsLoop, .{ self, stream });
    thread.detach();
}

fn eventsLoop(self: *Self, stream: anytype) !void {
 self.event_mutex.lock();
 defer self.even_mutex.unlock();
 while (true) { ... }
}

Was considering for a bit whether its worth supporting both approaches startEventStream() vs startEventStreamThread(fn) .. but that starts to get really confusing for the user.

If you only provided startEventStream(fn), then it the weird case where the use wants blocking IO and manual thread control - they can still use startEventStream(fn), and it will still work as expected, at the cost of launching 1 more thread

ie

eventHandler in a manual thread -> startEventStream(fn) spawns and detaches yet another thread -> terminate this eventHandler thread

still works, and presents a single startEventStream() API for all use cases

karlseguin commented 6 months ago

This is fixed as we discussed.

I'm going to revisit the threadpool. The documentation is vague (especially from kqueue), but it seems you can modify the notification list from different threads, which would make this easier than I had thought.

karlseguin commented 5 months ago

Master is now using a threadpool. The change is internal only.

I kept the latest changes to startEventStream. That is, it still takes a function and launches it in another thread, outside of the threadpool. I don't think there's a good threading/threadpool strategy for short-lived and long-lived requests.

I haven't observed a difference in performance, either way. But I believe this'll bring a performance gain and better latency for common cases where handlers are doing blocking work (e.g. hitting a database)

New top level configuration option:

.thread_pool = .{
    // Number threads. If you're handlers are doing a lot of i/o, a higher
    // number might provide better throughput
    .count = 4,

    // The maximum number of pending requests that the thread pool will accept
    // This applies back pressure to the above workers and ensures that, under load
    // pending requests get precedence over processing new requests.
    .backlog = 500,
},