python / cpython

The Python programming language
https://www.python.org
Other
62.63k stars 30.05k forks source link

show Python mimetypes module some love #50875

Open a08eda07-0dd1-4b77-b369-e6b5e187ca8c opened 15 years ago

a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago
BPO 6626
Nosy @ncoghlan, @merwok, @bitdancer, @jab
Files
  • mimetypes3.diff: patch "version 3"
  • mimetypes2.diff: patch "version 2"
  • apache_mimetypes.py: a list of tuples containing apache's default extension -> type mappings
  • mimetypes4.diff: patch "version 4"
  • mimetypes5.diff: patch "version 5"
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library'] title = 'show Python mimetypes module some love' updated_at = user = 'https://bugs.python.org/jrus' ``` bugs.python.org fields: ```python activity = actor = 'ncoghlan' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'jrus' dependencies = [] files = ['14630', '14631', '14633', '14696', '14731'] hgrepos = [] issue_num = 6626 keywords = ['patch'] message_count = 13.0 messages = ['91196', '91200', '91203', '91204', '91205', '91208', '91489', '91583', '91585', '91884', '93829', '128251', '209140'] nosy_count = 5.0 nosy_names = ['ncoghlan', 'eric.araujo', 'r.david.murray', 'jab', 'jrus'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue6626' versions = ['Python 3.5'] ```

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    See discussion started right at the end of the month at http://mail.python.org/pipermail/python-dev/2009-July/090928.html

    And continued at http://mail.python.org/pipermail/python-dev/2009-August/thread.html

    Basically, the mimetypes module is fragile and very confusing code, built up over years of feature creep without refactoring or careful overall design. I'd like to cut it down to a more manageable code size, fix some bugs, update the included list of mime types, and use some nice Python features of versions 2.2+. Ideally someone reading the module once through would be able to understand what it does.

    Patches to be attached shortly.

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    This diff should leave the semantics of the module essentially unchanged (including lazy-loading of default files), and also leave the particular MIME types used unchanged, even though these are out of date and should be updated; a subsequent suggested version will address that, perhaps after some discussion.

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    Here is a version of the patch which does away with the lazy loading: these are a small handful of easy-to-parse \~40k files; if the import takes an extra eye-blink, it shouldn't be too big a deal.

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    A fixed version of the patch from msg91200, 2009-08-02 20:08

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    This version (#4) switches to expressing the default types as a list of tuples instead of as a dict, so that we can include duplicate rows so that "reverse" type -> extension lookups will behave properly, once we start changing the actual content of the defaults.

    The types_map and common_types dictionaries (aliases to the singleton MimeTypes object's types_map property) have been left behaving as before for backwards compatibility.

    The tests still pass.

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    Here is a list I generated of all the current Apache mime.types:

    I would just as soon include this in the python standard library, either just the Apache file as is, or even these python object literals (maybe in a file outside of mimetypes.py), and then *not* import from Apache files by default, to cut down on external dependencies. There are several alternate MIME types for various types that should be added to this list (in earlier positions so they only are used in the type -> extension map).

    The only issue is that some users may have added to their Apache mime.types files for the sake of getting mailman or other python programs to do what they want. So I'm not entirely sure to what extent we should be 100% backwards compatible in such edge cases.

    My personal opinion is that the 'strict' option is unnecessary and should be set to do nothing, because users are more likely to want the predictable behavior where an unorthodox type gives back the proper extension, than the behavior where their code fails unless they pass a flag in: I don't see any reason for a user to want a 'type doesn't exist' message back for non-registered types. This isn't a "test for IANA registration" module.

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    Plone uses this thing, which has *much* more complexity than necessary for the standard library, but it might be nice to pick up the code for pulling types out of the windows registry, for instance.

    http://svn.plone.org/svn/archetypes/Products.MimetypesRegistry/trunk/Produ cts/MimetypesRegistry/MimeTypesRegistry.py

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    Okay, here's a version of this patch which (a) adds deprecation warnings, and (b) doesn't bother with lazy init. It should still be nearly completely backwards compatible with the previous mimetypes module.

    a08eda07-0dd1-4b77-b369-e6b5e187ca8c commented 15 years ago

    And at Rietveld, patch version 5: http://codereview.appspot.com/107042

    bitdancer commented 15 years ago

    See also bpo-6763.

    ncoghlan commented 14 years ago

    Putting this here for the record rather than leaving it in Rietveld:

    I appreciate the desire for a cleaner API for handling mimetypes, but this isn't the way to get it. Finding projects that have their own mimetypes implementations, asking them why they created their own rather than using the standard one, seeing what features are common to those APIs, etc, are all things that need to be done before making major changes to the standard library API.

    What you see as a critical bug (custom MimeTypes instances inheriting their initial settings from the mimetypes._db instance), you can bet some developers are relying on as a feature. If code is in the standard library, someone, somewhere, is relying on it working just the way it is now. Even bug fixes can sometimes break code that was designed to work around the presence of the bug.

    The concept of having a master copy that new instances are cloned from isn't even particularly objectionable, so long as people clearly understand that is what is going on (e.g. this happens with decimal.DefaultContext being used as the basis for new decimal.Context instances).

    With code this old, 'softly, softly' is the way to go, and the fewer user visible changes in semantics the better.

    merwok commented 13 years ago

    Thanks for working on cleaning up that module. I have to agree with Nick though (see also minor comments on Rietveld): code in the stdlib just can’t move as freely as outside of it.

    I’m updating the version to 3.3, given that this patch adds new features and refactors things (stable branches only get bug fixes).

    ncoghlan commented 10 years ago

    Note that I still believe there are substantial improvements that could be made without a wholesale rewrite of the module that poses significant backwards compatibility risks (just improving the documentation regarding how the list of types is populated could likely help some users, as would updating the default list we use if we can't retrieve one from the environment).

    Alternatively, even if we can't get anyone interested in such a refactoring task, it may be feasible to introduce an improved mimetypes handling interface that is easier to maintain and keep up to date, again without risking backwards compatibility issues for users of the current module.

    Some potentially relevant links for anyone wanting to investigate improving the standard library's MIME type support:

    The discussions with Jacob in Rietveld regarding his original approach: https://codereview.appspot.com/107042

    PyPI libraries:

    https://pypi.python.org/pypi/mimeparse/ https://pypi.python.org/pypi/mime https://pypi.python.org/pypi/zope.mimetype https://pypi.python.org/pypi/Products.MimetypesRegistry (Jacob pointed this one out above)

    The various PyPI wrappers around libmagic and the *nix "file" utility are also of potential interest for research purposes (but aren't especially useful on Windows, where those tools are significantly less likely to be available).