Quadratic performance of h2.connection.H2Connection._open_streams

kahuang commented 5 years ago

First let me say this is an excellent library! Thanks for taking the time to maintain/update it.

While writing an http2 client library for tornado (I can hopefully share it soon ;) ), I ran into very poor performance while benchmarking the client with many concurrently open streams (5000+). Here's a cProfile output for starting/finishing 5000 concurrent requests:

19128647 function calls (19069074 primitive calls) in 18.635 seconds
   Ordered by: internal time
11113314    4.002    0.000    4.002    0.000 stream.py:826(open)
5101    3.571    0.001    9.150    0.002 connection.py:392(_open_streams)
21279    1.580    0.000    1.580    0.000 {method 'items' of 'dict' objects}
34357    0.464    0.000    0.464    0.000 {method 'write' of '_ssl._SSLSocket' objects}
    ...

As you can see, almost half of the run time is spent in _open_streams

Here's the function code for reference:

  def _open_streams(self, remainder):
        """
        A common method of counting number of open streams. Returns the number
        of streams that are open *and* that have (stream ID % 2) == remainder.
        While it iterates, also deletes any closed streams.
        """
        count = 0
        to_delete = []

        for stream_id, stream in self.streams.items():
            if stream.open and (stream_id % 2 == remainder):
                count += 1
            elif stream.closed:
                to_delete.append(stream_id)

        for stream_id in to_delete:
            stream = self.streams.pop(stream_id)
            self._closed_streams[stream_id] = stream.closed_by

        return count

The function loops through all streams, and returns the count of all open remote/local streams. As self.max_outbound_streams is called on each stream open in send_headers, the performance of this function is quadratic in the number of concurrently open streams over the lifetime of the program.

And to check the numbers, stream.py's open() is called per element in self.streams. It was called 11,113,314 times. The benchmark was run with 5000 concurrent streams, and since open() is called on opening a stream, the runtime is O(1 + 2 + ... + n) =~ (n * (n-1)) / 2 in the worst case if all streams remain open during this time. If we insert 5000 into that formula we get 12,497,500 which is within 10% of the function calls we got from the cProfile output.

Proposal: Update H2Connection to incrementally update the count of open local/remote streams, to make the function call O(1).

The logic here being that we are processing all events, so we know when we're processing the event if the stream should be closed/open, and can update counts incrementally when there is a state change. I'm not super familiar with the code base, but we should also be able to clean up closed streams on state change as well, or else make some other process for cleaning up close streams from self.streams.

I'm happy to implement the change, just let me know if this is a path you'd be willing to go down, or if there is anything that makes this infeasible (totally possible! Started using this library two days ago :) )

kahuang commented 5 years ago

I've actually implemented a fix for this in https://github.com/python-hyper/hyper-h2/pull/1185

pgjones commented 5 years ago

I do plan to get to this, but I need to find a bit more time given I don't understand it yet. Thank you for looking into this and the fix though.

kahuang commented 5 years ago

For sure, let me know if there's anything I can help to clarify

kahuang commented 5 years ago

@pgjones anything that I can do to help push this along? Currently using my fork in production, which is making tens of millions of http2 requests/day, but would prefer switching to use upstream

dimaqq commented 4 years ago

I'm seeing this too. In this "simple" (cough) test, I'm trying to send 30k requests to a go server that (by default) allows 250 concurrent streams. I wait a tad at the start to receive server settings, and start all the requests at the same time, thus 250 requests succeed and the rest my "simple" code rejects because the concurrent stream quota is full. Here's the closeup on profile:

python-hyper / h2

Quadratic performance of h2.connection.H2Connection._open_streams #1184