Closed tonybaloney closed 1 year ago
This code seems to come back to a 9 year old commit for Python 3. It might need refactoring
https://github.com/pallets/werkzeug/commit/aee68cfa9a761e711f80c0011d8dff6bff52b32d
A lot of this overhead is due to url_quote()
being called multiple times from iri_to_uri https://github.com/pallets/werkzeug/blob/main/src/werkzeug/urls.py#L812-L816
The call signature or url_quote()
is to accept either str or bytes, but according to the types in iri_to_uri()
it's only ever sending str.
If they were split into 2 functions, one for str and another for bytes then you would avoid all the extra encode/decoding and isinstance() checks.
The execution time for these functions is down to 3 things:
This is related to a longer term goal I have. Much of the urls
module is a copy of Python 2's urlparse
and Python 3's urllib.parse
. Due to compatibility when this first happened, there are tons of redundant parsing/unparsing and encoding/decoding. I would like to remove as much reliance on our own url functions as possible and go back to Python's implementation now that we only support Python 3.
I want to try to make a branch of Werkzeug that replaces all use of our urls
with Python's urllib
. It would be interesting to see the performance results there, I wonder if urllib
has similar issues. At least it would reduce the maintenance surface of Werkzeug, and maybe allow us to focus on reimplementing specific functions.
This is related to a longer term goal I have. Much of the
urls
module is a copy of Python 2'surlparse
and Python 3'surllib.parse
. Due to compatibility when this first happened, there are tons of redundant parsing/unparsing and encoding/decoding. I would like to remove as much reliance on our own url functions as possible and go back to Python's implementation now that we only support Python 3.
That would be frustrating as the interface of the Python module is pretty shit: the *Result
objects remain barebones named tuples, which lack all the utility and convenience of werkzeug.urls.BaseURL
.
Now I can only speak for myself, but I routinely add werkzeug dependencies to projects just so I can use werkzeug.urls. The only annoyance I can remember with it is that "simple" querystring manipulations on existing urls are not completely straightforward (BaseURL has no dedicated shortcuts so it requires decoding the QS, updating it, re-encoding it, then replacing it). But overall it's an absolute joy to work with in all the ways urllib.parse
isn't.
Have you provided that feedback as issues or pull requests on the Python tracker? I'm not sure what I'm supposed to do about that, I don't want to maintain a parallel implementation of something that is basically in the standard library.
I looked at the line-by-line profile for url_quote
and found the following line in urls.py to be the heavy one:
safe = (frozenset(bytearray(safe)) | _always_safe) - frozenset(bytearray(unsafe))
Using the same benchmark script provided above, you can shave off .6-.7 second from the total by replacing both instances of this construct with a function that doesn't use frozenset
or bytearray
objects but rather plain bytes
objects along the lines of the following:
def mksafe(safe, unsafe):
safe = safe if type(safe) is bytes else bytes(safe)
unsafe = unsafe if type(unsafe) is bytes else bytes(unsafe)
safe = bytearray(safe + _always_safe)
removed = bytearray()
for b in unsafe:
if b in removed:
continue
removed.append(b)
bi = safe.rfind(b)
while bi >= 0:
safe.pop(bi)
bi = safe.rfind(b)
return safe
This is ugly, I guess, and it doesn't check for strings and handle encoding -- but the code in relevant functions seems to take care of these concerns apart from the given construct. Of course, these byte sequences are not unique (as with the sets), but they are immutable. Even so, it appears to be faster than the present.
I mention this only to say that these functions could probably be optimized to speed them up, possibly considerably given these initial results. But if the general sense is that they should be moved over to the urllib
family of functions instead of doing any further tinkering with what's there now, then I guess it's moot. I could, of course, also be missing some regression in functionality here, too (and would be happy for somebody to note any).
But if it's the case that it'd be preferred to just make the move to urllib
instead, I'm happy to take a shot at it, if it'd be helpful.
Since removing our own implementation would be a pretty big project, you're welcome to add an improvement before that. I don't know when the removal would happen, I could have time tomorrow, or in a year.
Yes, I agree it would be a considerable amount of work to make the move to urllib.parse
functions. And I'd imagine it'd be better for that to be done by or at least with the direct involvement of somebody or some people with more background on how the current code came to be.
I'll likely roll in some (much) smaller changes to see if the performance concerns might be allayed without such massive alterations and let you/others take a look.
See https://github.com/pallets/werkzeug/issues/2600, I've been working on switching to urllib.parse
, and have a 35% speedup on routing and responses so far. Next part is to stop allowing bytes/tuples and only work with strings.
That's fantastic! Let me know if I can help out in any way.
I've been using the following benchmark to run with cprofile and timeit.
from werkzeug import Request, Response, Client
from werkzeug.exceptions import HTTPException
from werkzeug.routing import Map, Rule
@Request.application
def app(request):
router = url_map.bind_to_environ(request.environ)
router.match()
return Response("Hello, World!")
url_map = Map([Rule("/hello", endpoint="index"), Rule("/", endpoint="index")])
client = Client(app)
def bench(n=10_000):
for _ in range(n):
client.get("/hello")
client.get("/")
if __name__ == "__main__":
bench()
$ python -m cProfile -s time -o bench.prof bench.py
$ pipx run snakeviz bench.prof
$ python -m timeit -s 'from bench import bench' 'bench(1000)'
This benchmark script:
Run with cProfile
python -m cProfile -s time bench.py
returns
url_quote()
url_quote
is run 34 times per requesturl_parse
is run 14 times per request_unquote_to_bytes
is run 14 times per request.isinstance()
is run multiple times from each of those functions.Running this on a line profiler shows that this line is particularly expensive. Around 6% of the overall benchmark CPU time.
https://github.com/pallets/werkzeug/blob/main/src/werkzeug/urls.py#L572
if this function could be optimized it would have a knock-on effect to Flask performance.