karlseguin / http.zig

An HTTP/1.1 server for zig
MIT License
533 stars 41 forks source link

Question: example with using `Transfer-Encoding: chunked` #38

Closed dephiros closed 5 months ago

dephiros commented 8 months ago

Hi, Thanks again for the library.

I am trying to learn and understand how Transfer-Encoding: chunked would work this library. Would you be able to provide an example? Any explanation or pointer would be appreciated.

Context: I am trying to understand http at a lower level in this lib coming from a higher level runtime like Node

karlseguin commented 8 months ago

Not sure if there's a specific part that you're looking at, but...

The way master works is that we have "workers" which (a) accept connections (b) read from the socket and (c) parse the request. Once the request is parsed, the connection is passed to a thread pool where the application handler executes. When the application handler returns, control is passed back to the original worker where the response is written asynchronously.

This is done this way because Zig doesn't have async or some other concurrency primitive. In almost all cases, when the "application handler" executes, it's going to execute synchronously...if the appliation handler has to read from a file, access a database, or hit a 3rd party service, it'll using blocking calls.

Keeping that in mind, if your application handler wanted to stream chunks, you might think something like:

while (someDataSource.next()) |value| {
  try res.chunk(value);
}

But, as-is, the issue with this is the socket is in non-blocking mode and thus writing to it is asynchronous. Beause Zig doesn't have async built-in, there's no event loop that both http.zig and the application handler can use and there's no "await" for the application to use.

The solution that we've used to far is that when the application wants to explicitly write to the socket, we switch to blocking mode. You can see this in the res.write function: https://github.com/karlseguin/http.zig/blob/573e94b64ac611170a9cc9cdc2f62b3f50b546d5/src/response.zig#L124

So we should probably add a chunk function to the Response, something like:

fn chunk(self: *Response, data: []const u8) !void {
  var conn = self.conn;
  var stream = conn.stream;
  if (self.chunked == false) {
    // first time calling chunk(), we need to switch to blocking mode and
    // write the header
    try conn.blocking();

    self.chunked = true;

    // TODO: change prepareForWrite to see that chunked == true and emit
    // the correct headers
    try state.prepareForWrite(self);

    // Not write the header
    try stream.writeAll(state.header_buffer.data[0..state.header_len]);
  }

  var buf: [20]u8 = undefined;
  buf[0] = '\r';
  buf[1] = '\n';
  const len = 2 + std.fmt.formatIntBuf(buf[2..], data.len, 16, .upper, .{});
  buf[len] = '\r';
  buf[len+1] = '\n';
  try stream.writeAll(buf[0..len+2]);
  try stream.writeAll(data);
}

It's something I'm happy to add, there's actually support for this in the blocking branch. I must have removed it when switching to non-blocking and never added it back in.

dephiros commented 7 months ago

Thank you for the explanation. Give me some pointer to refamiliarize with tcp connection/socket... again after so many year

When we call chunk the first time:

Afterward, chunk seems to only write to socket with delimiter

And when the handler function is done writing, control is passed back to httpz to clean up the connection(unless keepalive but that seems to be another discussion)

karlseguin commented 7 months ago

Opps, comment should be "Note" not "Not".

So ya, on first call to chunk we need to write out the header + write the first chunk. CallingprepareForWrite is re-using the existing code to setup our headers. The "workers", which parse the request and write the response don't deal with *httpz.Request and *httpz.Response, they deal with internal state objects. prepareForWrite takes whatever the application did to *httpz.Response and sets up the response state object.

Probably don't need both the Request (app) and Request State (internal) and the Response (app) and Response State (internal), but I guess it keeps the code a little more focused. The response object can be fully dedicated to what the application needs, and the response state can be dedicated to what the framework needs for writing.