icing / mod_h2

HTTP/2 module for Apache httpd
https://icing.github.io/mod_h2/
Apache License 2.0
256 stars 41 forks source link

C10K deployments, fully event oriented connection processing #91

Open icing opened 8 years ago

icing commented 8 years ago

With releases 1.2.8 here (and 2.4.20 in Apache httpd), HTTP/2 connections will make use of the async feature of the event mpm. Basically, the connection is set aside until new data is coming in. This frees workers and allows the server to handle more connections in parallel.

In principle, HTTP/2 connections could behave fully event oriented all the time, since request processing is handed to h2 worker threads and therefore do not block the main connection. In practice, the implementations of mod_http2 and mpm_event need to be adjusted. mod_http2 needs to shift all processing into the connection filters, no longer relying on any stack state. mpm_event needs to know learn about new connection states that require different handling than the HTTP/1 processing implemented currently.

At least in major parts, this could/should allow back porting to the 2.4.x branch.

elukey commented 5 years ago

Is there any plan to make this happen? It seems to be a really nice improvement, but I am not sure if the actual complexity of mod_h2 makes this effort not worth in term of time-spent/performance-gained (I also suppose that such a heavy customization would need to be done for mpm-worker too, or alternatively it support dropped?).

icing commented 5 years ago

This was written when I was younger. Now, I'd say that the mentioned back port to 2.4 for such a development is a fantasy.

Tighter integration of h2 workers into the mpm infrastructure remains a desirable goal for a future release line. But no plan exists for that.

elukey commented 5 years ago

Thanks :)

I am asking this since I'd be interested in working on it, even if the task seems really hard. A better understanding of mod_h2 and mpm-event would really be nice for me, but coming up with a plan from scratch seems a difficult task this is why I was asking :)

icing commented 5 years ago

I would have to dive deeper into the mpm modules myself in order to answer that. ;-)

Basically, each h2 worker is like a http/1.1 request processing connection. However, it does not send or receive HTTP headers in text format (usually, unless H2SerializeHeaders is on - which should be scrapped anyway in new release lines).

So, instead of a socket to read/write, you have these infamous bucket beams that try to shuffle the data back and forth with as little copying as possible. File handles in responses (e.g. serving a static file) a passed as is. This saves a lot of buffer memory for such use cases.

However, all mpms really work on sockets. One could make a socket for each h2 worker and just write a byte to it to signal readiness. This would be nice because the current code is a bit awkward in handling workers and main connection at the same time (see timed waits).

One advantage of the current architecture is that h2 workers and mpm threads are separate pools.

This is good, because on many setups, the h2 worker will just look look up a file and return it to the main connection where it is send out as fast as the client can read it. But "as fast as the client can read" is often several magnitudes slower than the file lookup itself. So, the actual h2 tasks are way shorter (in duration) than the sending of the data on the connection to the client.

Also, with separate pools, one is not in danger of deadlocking. Imagine one pool of 2 threads for everything. When 2 new connections are opened at the same time, all threads are busy and no connection can get a new thread to process its requests...

So, there is no plan, but there have been some thoughts about it, which I just dumped here... ;-)

elukey commented 5 years ago

So the trick used by mod_h2, IIUC, is that the connection is processed using nghttp2, and then each stream is assigned to a h2 worker that process it as it was a regular http/1.1 request (triggering the input/output filters, etc..). Moving this logic to the mpm is hard since we'd need to make it aware of what http/2 is and its states (moving part of the logic from mod_h2 to the mpm itself).

elukey commented 5 years ago

The abstractions should be, IIUC (at very high level):

h2 session -> manages the connection with the client h2 stream -> manages each stream within the connection h2 task -> a h2 stream "translated" into a httpd's HTTP/1.1-like request so all the regular core processing can work (request input filters, handlers, etc..)

Given what you explained above, it might be a huge effort to add h2 handling logic to the mpm, especially if we'll hav to do it for every http version iteration (I guess a similar thing would need as well for HTTP/3.0, given udp sockets processing of course). And we are not sure about what benefits it may bring (performance?, code clarity?, integration with status, etc..)