pallets / werkzeug

The comprehensive WSGI web application library.
https://werkzeug.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
6.65k stars 1.73k forks source link

Basic benchmark of flask uncovers costly line in urls.py #2406

Closed tonybaloney closed 1 year ago

tonybaloney commented 2 years ago

This benchmark script:

from flask import Flask

def create_app():
    """Create and configure an instance of the Flask application."""
    app = Flask(__name__, instance_relative_config=True)

    @app.route("/hello")
    def index():
        return "Hello, World!"

    # make url_for('index') == url_for('blog.index')
    # in another app, you might define a separate main index here with
    # app.route, while giving the blog blueprint a url_prefix, but for
    # the tutorial the blog will be the main index
    app.add_url_rule("/", endpoint="index")

    return app

app = create_app()

def bench(n = 10_000):
    for _ in range(n):
        app.test_client().get("/hello")
        app.test_client().get("/")

bench()

Run with cProfile

python -m cProfile -s time bench.py

returns

        26003364 function calls (25977275 primitive calls) in 29.408 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   340012    2.902    0.000    3.460    0.000 urls.py:547(url_quote)
   140000    1.483    0.000    2.629    0.000 urls.py:457(url_parse)
   140000    0.794    0.000    1.003    0.000 urls.py:387(_unquote_to_bytes)
5147167/5147129    0.771    0.000    0.771    0.000 {built-in method builtins.isinstance}

Running this on a line profiler shows that this line is particularly expensive. Around 6% of the overall benchmark CPU time.

https://github.com/pallets/werkzeug/blob/main/src/werkzeug/urls.py#L572

if this function could be optimized it would have a knock-on effect to Flask performance.

tonybaloney commented 2 years ago

This code seems to come back to a 9 year old commit for Python 3. It might need refactoring

https://github.com/pallets/werkzeug/commit/aee68cfa9a761e711f80c0011d8dff6bff52b32d

tonybaloney commented 2 years ago

A lot of this overhead is due to url_quote() being called multiple times from iri_to_uri https://github.com/pallets/werkzeug/blob/main/src/werkzeug/urls.py#L812-L816

The call signature or url_quote() is to accept either str or bytes, but according to the types in iri_to_uri() it's only ever sending str. If they were split into 2 functions, one for str and another for bytes then you would avoid all the extra encode/decoding and isinstance() checks.

tonybaloney commented 2 years ago

image image

The execution time for these functions is down to 3 things:

  1. The API accepting str and bytes, then having to check and encode decode
  2. The continuous encode/decode cycle between those functions of the same data
  3. _make_encode_wrapper is inefficiently using a lambda, it's also called 144 times per request
davidism commented 2 years ago

This is related to a longer term goal I have. Much of the urls module is a copy of Python 2's urlparse and Python 3's urllib.parse. Due to compatibility when this first happened, there are tons of redundant parsing/unparsing and encoding/decoding. I would like to remove as much reliance on our own url functions as possible and go back to Python's implementation now that we only support Python 3.

davidism commented 2 years ago

I want to try to make a branch of Werkzeug that replaces all use of our urls with Python's urllib. It would be interesting to see the performance results there, I wonder if urllib has similar issues. At least it would reduce the maintenance surface of Werkzeug, and maybe allow us to focus on reimplementing specific functions.

xmo-odoo commented 2 years ago

This is related to a longer term goal I have. Much of the urls module is a copy of Python 2's urlparse and Python 3's urllib.parse. Due to compatibility when this first happened, there are tons of redundant parsing/unparsing and encoding/decoding. I would like to remove as much reliance on our own url functions as possible and go back to Python's implementation now that we only support Python 3.

That would be frustrating as the interface of the Python module is pretty shit: the *Result objects remain barebones named tuples, which lack all the utility and convenience of werkzeug.urls.BaseURL.

Now I can only speak for myself, but I routinely add werkzeug dependencies to projects just so I can use werkzeug.urls. The only annoyance I can remember with it is that "simple" querystring manipulations on existing urls are not completely straightforward (BaseURL has no dedicated shortcuts so it requires decoding the QS, updating it, re-encoding it, then replacing it). But overall it's an absolute joy to work with in all the ways urllib.parse isn't.

davidism commented 2 years ago

Have you provided that feedback as issues or pull requests on the Python tracker? I'm not sure what I'm supposed to do about that, I don't want to maintain a parallel implementation of something that is basically in the standard library.

runderwood commented 2 years ago

I looked at the line-by-line profile for url_quote and found the following line in urls.py to be the heavy one:

safe = (frozenset(bytearray(safe)) | _always_safe) - frozenset(bytearray(unsafe))

Using the same benchmark script provided above, you can shave off .6-.7 second from the total by replacing both instances of this construct with a function that doesn't use frozenset or bytearray objects but rather plain bytes objects along the lines of the following:

def mksafe(safe, unsafe):
       safe = safe if type(safe) is bytes else bytes(safe)
       unsafe = unsafe if type(unsafe) is bytes else bytes(unsafe)
       safe = bytearray(safe + _always_safe)
       removed = bytearray()
       for b in unsafe:
           if b in removed:
               continue
           removed.append(b)
           bi = safe.rfind(b)
           while bi >= 0:
               safe.pop(bi)
               bi = safe.rfind(b)
       return safe

This is ugly, I guess, and it doesn't check for strings and handle encoding -- but the code in relevant functions seems to take care of these concerns apart from the given construct. Of course, these byte sequences are not unique (as with the sets), but they are immutable. Even so, it appears to be faster than the present.

I mention this only to say that these functions could probably be optimized to speed them up, possibly considerably given these initial results. But if the general sense is that they should be moved over to the urllib family of functions instead of doing any further tinkering with what's there now, then I guess it's moot. I could, of course, also be missing some regression in functionality here, too (and would be happy for somebody to note any).

But if it's the case that it'd be preferred to just make the move to urllib instead, I'm happy to take a shot at it, if it'd be helpful.

davidism commented 2 years ago

Since removing our own implementation would be a pretty big project, you're welcome to add an improvement before that. I don't know when the removal would happen, I could have time tomorrow, or in a year.

runderwood commented 2 years ago

Yes, I agree it would be a considerable amount of work to make the move to urllib.parse functions. And I'd imagine it'd be better for that to be done by or at least with the direct involvement of somebody or some people with more background on how the current code came to be.

I'll likely roll in some (much) smaller changes to see if the performance concerns might be allayed without such massive alterations and let you/others take a look.

davidism commented 1 year ago

See https://github.com/pallets/werkzeug/issues/2600, I've been working on switching to urllib.parse, and have a 35% speedup on routing and responses so far. Next part is to stop allowing bytes/tuples and only work with strings.

runderwood commented 1 year ago

That's fantastic! Let me know if I can help out in any way.

davidism commented 1 year ago

I've been using the following benchmark to run with cprofile and timeit.

from werkzeug import Request, Response, Client
from werkzeug.exceptions import HTTPException
from werkzeug.routing import Map, Rule

@Request.application
def app(request):
    router = url_map.bind_to_environ(request.environ)
    router.match()
    return Response("Hello, World!")

url_map = Map([Rule("/hello", endpoint="index"), Rule("/", endpoint="index")])
client = Client(app)

def bench(n=10_000):
    for _ in range(n):
        client.get("/hello")
        client.get("/")

if __name__ == "__main__":
    bench()
$ python -m cProfile -s time -o bench.prof bench.py
$ pipx run snakeviz bench.prof
$ python -m timeit -s 'from bench import bench' 'bench(1000)'