python / cpython

The Python programming language
https://www.python.org
Other
63.67k stars 30.5k forks source link

Make SimpleHTTPRequestHandler load mimetypes lazily #79473

Closed zooba closed 4 years ago

zooba commented 6 years ago
BPO 35292
Nosy @pfmoore, @tjguk, @pmp-p, @zware, @zooba, @demianbrecht, @danishprakash
PRs
  • python/cpython#11036
  • python/cpython#17822
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['easy', 'library', '3.9', 'OS-windows', 'performance'] title = 'Make SimpleHTTPRequestHandler load mimetypes lazily' updated_at = user = 'https://github.com/zooba' ``` bugs.python.org fields: ```python activity = actor = 'steve.dower' assignee = 'none' closed = True closed_date = closer = 'steve.dower' components = ['Library (Lib)', 'Windows'] creation = creator = 'steve.dower' dependencies = [] files = [] hgrepos = [] issue_num = 35292 keywords = ['patch', 'easy'] message_count = 15.0 messages = ['330212', '330292', '330322', '331381', '331386', '331392', '331394', '331395', '331396', '331398', '331399', '331400', '331739', '359621', '359622'] nosy_count = 7.0 nosy_names = ['paul.moore', 'tim.golden', 'pmpp', 'zach.ware', 'steve.dower', 'demian.brecht', 'danishprakash'] pr_nums = ['11036', '17822'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue35292' versions = ['Python 3.9'] ```

    zooba commented 6 years ago

    Importing http.server triggers mimetypes.init(), which can be fairly expensive on Windows (and potentially other platforms?) due to having to enumerate a lot of registry keys.

    Instead, SimpleHTTPRequestHandler.extensions_map can be a dict with just its default values in lib/http/server.py and the first call to guess_type() can initialize mimetypes if necessary.

    We should check whether people read from SimpleHTTPRequestHandler.extensions_map directly before calling guess_type(), and decide how quickly we can make the change. My gut feeling is we will be okay to make this in the next release but probably shouldn't backport it.

    4a064003-f3f1-41f9-9ba7-a7baf9a6029c commented 6 years ago

    I would like to work on this if you have not already started, Steve.

    We should check whether people read from SimpleHTTPRequestHandler.extensions_map directly before calling guess_type(), and decide how quickly we can make the change.

    Are you implying we make the change based on whether someone use SimpleHTTPRequestHandler.extensions_map before calling guess_type()? Could you please elaborate that a bit, thanks.

    zooba commented 6 years ago

    You're welcome to it - I deliberately left it for someone else to work on, though I'm happy to review and merge.

    The visible change of making this lazy is that if someone reads from the dict, it'll be missing system-specified content types (until we lazily fill it in guess_type() ). Nobody is meant to read from it - it's public for people to *add* custom overrides - so this should be okay, but a little bit of GitHub research to find out if anyone does wouldn't hurt. Then we can at least warn them that it's changing before they run into their own bug.

    1f4f0d85-e63f-4322-8b54-768e60e1d01b commented 5 years ago

    and potentially other platforms? strangely, it does not.

    but adding a blocking read on first requests may ruin profiling data on any platform.

    isn't http.server reserved for instrumentation and testing ?

    zooba commented 5 years ago

    Instrumentation is not the same thing as profiling, and this one is certainly not intended for performance. That said, simply importing it shouldn't impact anyone else's performance either, and since six imports this, fixing it will have a pretty broad impact.

    Thanks for the PR - I'll look when I get a chance, but most of my time is going towards the next 3.7 release right now. Feel free to ping me on here in a week if nobody has had a look.

    1f4f0d85-e63f-4322-8b54-768e60e1d01b commented 5 years ago

    ??? i never compared Instrumentation and profiling

    and what getting delta timing about the script behind the first http request has to do with performance ? \<= that's actually another question

    thefirst was : isn't http.server reserved for instrumentation and testing (because lazy loading seems a production feature to me ) ?

    zooba commented 5 years ago

    If you weren't conflating profiling and instrumentation then your original comment makes no sense.

    zooba commented 5 years ago

    (Sorry, page submitted while I was still typing for some reason.)

    ... Instrumentation for everything other than time is unaffected, so if you didn't mean to conflate them, then no there is no impact on the documented use.

    Yes, moving mimetypes.init to the first request will make the first request slower. However, it makes *import* faster, and there should not be a significant penalty just for importing this module.

    You can trivially front-load the work in your app by calling mimetypes.init yourself, which will reduce the overhead on first request to a dict copy (I assume, haven't looked at the PR yet). Or just don't measure the first request - it's typical to do some warm-up loops when profiling.

    1f4f0d85-e63f-4322-8b54-768e60e1d01b commented 5 years ago

    the PR as it is actually add a blocking read (and maybe exceptions or whatever mimetypes modules decide to do) in processing of the first request to server, which has nothing to do with the code serving that request.

    I agree that makes no sense at all.

    zooba commented 5 years ago

    I agree that makes no sense at all.

    I have to admit I have no idea what you're trying to argue or suggesting as an alternative. Could you try clearly explaining the change you'd rather see to fix the issue?

    1f4f0d85-e63f-4322-8b54-768e60e1d01b commented 5 years ago

    sorry i was on my free time when enumerating profiling/instrumentation and testing. given now you pay attention i'll try to explain more.

    instrumentation : what does that server actually support ? eg writing a test for a specific mimitype support eg https://bugs.python.org/issue35403

    profiling: i just want to time the first request to a service, ie without learning all the art of warming up a *one liner* http server with bare hands.

    testing: i want to test the servicing code, not receive error because mimetypes module may have failed late and server should not have started *at all* since it is a critical component of it.

    how to fix the issue ? as i said earlier, strangely "other platforms" don't have that issue so maybe the problem is in mimetypes module.

    So my position is "keep it that way" and investigate the slowdown at import on concerned platform.

    zooba commented 5 years ago

    So, you're suggesting doing a lazy mimetypes.init() on Windows and not on the others? There's no reason to have such a semantic difference between platforms, and since the whole point of mimetypes.init() is to avoid initializing on import, it makes the most sense to avoid initializing on import.

    In fact, it probably makes the *most* sense for http.server to have its own overrides, but otherwise fall back to the mimetypes function when needed. Bypassing the public API doesn't really add anything other than fragility.

    zooba commented 5 years ago

    I have another idea - what if we checked self.extensions_map for overrides but otherwise just called mimetypes.guess_type(path) in the guess_type method? That would be the same behaviour as initializing it lazily, but we don't have to worry about initializing at all, since guess_type() will do it.

    I also did a bit of a GitHub search (which I generally criticize, but it's a pretty good way to evaluate whether something is in _common_ use). Mostly it looks like code will be unaffected by this change - people set values in extensions_map, but don't expect to be able to read global values out of it.

    zooba commented 4 years ago

    New changeset 5907e61a8d4da6d0f11bf1062d6d17484560a15e by Steve Dower (An Long) in branch 'master': bpo-35292: Avoid calling mimetypes.init when http.server is imported (GH-17822) https://github.com/python/cpython/commit/5907e61a8d4da6d0f11bf1062d6d17484560a15e

    zooba commented 4 years ago

    Thanks for the patch!