Open a08eda07-0dd1-4b77-b369-e6b5e187ca8c opened 15 years ago
See discussion started right at the end of the month at http://mail.python.org/pipermail/python-dev/2009-July/090928.html
And continued at http://mail.python.org/pipermail/python-dev/2009-August/thread.html
Basically, the mimetypes module is fragile and very confusing code, built up over years of feature creep without refactoring or careful overall design. I'd like to cut it down to a more manageable code size, fix some bugs, update the included list of mime types, and use some nice Python features of versions 2.2+. Ideally someone reading the module once through would be able to understand what it does.
Patches to be attached shortly.
This diff should leave the semantics of the module essentially unchanged (including lazy-loading of default files), and also leave the particular MIME types used unchanged, even though these are out of date and should be updated; a subsequent suggested version will address that, perhaps after some discussion.
Here is a version of the patch which does away with the lazy loading: these are a small handful of easy-to-parse \~40k files; if the import takes an extra eye-blink, it shouldn't be too big a deal.
A fixed version of the patch from msg91200, 2009-08-02 20:08
This version (#4) switches to expressing the default types as a list of tuples instead of as a dict, so that we can include duplicate rows so that "reverse" type -> extension lookups will behave properly, once we start changing the actual content of the defaults.
The types_map and common_types dictionaries (aliases to the singleton MimeTypes object's types_map property) have been left behaving as before for backwards compatibility.
The tests still pass.
Here is a list I generated of all the current Apache mime.types:
I would just as soon include this in the python standard library, either just the Apache file as is, or even these python object literals (maybe in a file outside of mimetypes.py), and then *not* import from Apache files by default, to cut down on external dependencies. There are several alternate MIME types for various types that should be added to this list (in earlier positions so they only are used in the type -> extension map).
The only issue is that some users may have added to their Apache mime.types files for the sake of getting mailman or other python programs to do what they want. So I'm not entirely sure to what extent we should be 100% backwards compatible in such edge cases.
My personal opinion is that the 'strict' option is unnecessary and should be set to do nothing, because users are more likely to want the predictable behavior where an unorthodox type gives back the proper extension, than the behavior where their code fails unless they pass a flag in: I don't see any reason for a user to want a 'type doesn't exist' message back for non-registered types. This isn't a "test for IANA registration" module.
Plone uses this thing, which has *much* more complexity than necessary for the standard library, but it might be nice to pick up the code for pulling types out of the windows registry, for instance.
http://svn.plone.org/svn/archetypes/Products.MimetypesRegistry/trunk/Produ cts/MimetypesRegistry/MimeTypesRegistry.py
Okay, here's a version of this patch which (a) adds deprecation warnings, and (b) doesn't bother with lazy init. It should still be nearly completely backwards compatible with the previous mimetypes module.
And at Rietveld, patch version 5: http://codereview.appspot.com/107042
See also bpo-6763.
Putting this here for the record rather than leaving it in Rietveld:
I appreciate the desire for a cleaner API for handling mimetypes, but this isn't the way to get it. Finding projects that have their own mimetypes implementations, asking them why they created their own rather than using the standard one, seeing what features are common to those APIs, etc, are all things that need to be done before making major changes to the standard library API.
What you see as a critical bug (custom MimeTypes instances inheriting their initial settings from the mimetypes._db instance), you can bet some developers are relying on as a feature. If code is in the standard library, someone, somewhere, is relying on it working just the way it is now. Even bug fixes can sometimes break code that was designed to work around the presence of the bug.
The concept of having a master copy that new instances are cloned from isn't even particularly objectionable, so long as people clearly understand that is what is going on (e.g. this happens with decimal.DefaultContext being used as the basis for new decimal.Context instances).
With code this old, 'softly, softly' is the way to go, and the fewer user visible changes in semantics the better.
Thanks for working on cleaning up that module. I have to agree with Nick though (see also minor comments on Rietveld): code in the stdlib just can’t move as freely as outside of it.
I’m updating the version to 3.3, given that this patch adds new features and refactors things (stable branches only get bug fixes).
Note that I still believe there are substantial improvements that could be made without a wholesale rewrite of the module that poses significant backwards compatibility risks (just improving the documentation regarding how the list of types is populated could likely help some users, as would updating the default list we use if we can't retrieve one from the environment).
Alternatively, even if we can't get anyone interested in such a refactoring task, it may be feasible to introduce an improved mimetypes handling interface that is easier to maintain and keep up to date, again without risking backwards compatibility issues for users of the current module.
Some potentially relevant links for anyone wanting to investigate improving the standard library's MIME type support:
The discussions with Jacob in Rietveld regarding his original approach: https://codereview.appspot.com/107042
PyPI libraries:
https://pypi.python.org/pypi/mimeparse/ https://pypi.python.org/pypi/mime https://pypi.python.org/pypi/zope.mimetype https://pypi.python.org/pypi/Products.MimetypesRegistry (Jacob pointed this one out above)
The various PyPI wrappers around libmagic and the *nix "file" utility are also of potential interest for research purposes (but aren't especially useful on Windows, where those tools are significantly less likely to be available).
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', 'library']
title = 'show Python mimetypes module some love'
updated_at =
user = 'https://bugs.python.org/jrus'
```
bugs.python.org fields:
```python
activity =
actor = 'ncoghlan'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'jrus'
dependencies = []
files = ['14630', '14631', '14633', '14696', '14731']
hgrepos = []
issue_num = 6626
keywords = ['patch']
message_count = 13.0
messages = ['91196', '91200', '91203', '91204', '91205', '91208', '91489', '91583', '91585', '91884', '93829', '128251', '209140']
nosy_count = 5.0
nosy_names = ['ncoghlan', 'eric.araujo', 'r.david.murray', 'jab', 'jrus']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue6626'
versions = ['Python 3.5']
```