python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

EmailMessage.is_attachment == False if filename is present #65278

Closed 7f693efa-c293-4e08-80e6-4c8268e300a5 closed 10 years ago

7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago
BPO 21079
Nosy @warsaw, @bitdancer, @serhiy-storchaka
Files
  • is_attachment.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['expert-email'] title = 'EmailMessage.is_attachment == False if filename is present' updated_at = user = 'https://bugs.python.org/brandon-rhodes' ``` bugs.python.org fields: ```python activity = actor = 'serhiy.storchaka' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['email'] creation = creator = 'brandon-rhodes' dependencies = [] files = ['34641'] hgrepos = [] issue_num = 21079 keywords = ['patch'] message_count = 15.0 messages = ['214969', '214970', '214972', '214974', '214978', '214990', '214995', '214997', '215002', '215004', '215014', '215037', '227181', '227182', '227221'] nosy_count = 5.0 nosy_names = ['barry', 'r.david.murray', 'brandon-rhodes', 'python-dev', 'serhiy.storchaka'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue21079' versions = ['Python 3.4', 'Python 3.5'] ```

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    Most attachments (in my inbox, at least) specify a filename, and thus have a Content-Disposition header that looks like:

    Content-Disposition: attachment; filename="attachment.gz"

    In fact, this sample header was generated by the new add_attachment() method in Python itself. Unfortunately, the is_attachment property currently does this test:

    c_d.lower() == 'attachment'

    Which means that it returns False for almost all attachments in my email archive. I believe that the test should instead be:

    c_d.split(';', 1)[0].lower() == 'attachment'

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    Oh - this also, happily, explains why iter_attachments() is ignoring all of the attachments on my email: because it internally relies upon is_attachment to make the decision. So this fix will also make iter_attachments() usable!

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    Okay, having looked at the source a bit more it would probably make more sense to use _splitparam() instead of doing the split manually.

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    Given that methods like get_param() already exist for pulling data out of the right-hand-side of the ';' in a parameterized email header, would it be amiss for EmailMessage to also have a method that either returns everything to the left of the semicolon, or returns something like:

    ('attachment', [('filename', 'example.txt')])

    thus doing all the parsing in one place that everything else can then steadily rely upon, including users that want to pull the parsed values for their own inspection?

    bitdancer commented 10 years ago

    That facility already mostly exists. The bug is that the code in question doesn't use it.

    >>> m['Content-Disposition'].content_disposition
    'attachment'
    >>> m['Content-Disposition'].params
    {'filename': 'attachment.gz'}

    On the other hand, looking at that it is obvious there should be a generic 'value' attribute on all MIME headers so that that could be written:

    m['Content-Disposition'].value == 'attachment'

    where value would be the 'canonical form' of the value for that header when there is one, including normalizing the case. Some headers still want specialized attributes (Content-Type's maintype and subtype, for example), but there is always the value/params split, and that ought to be accessible generically and currently isn't.

    This is why this stuff is still provisional :)

    bitdancer commented 10 years ago

    Here's patch.

    bitdancer commented 10 years ago

    I was going to open new issue for adding 'value', but looking at the parsing code I see why I didn't add one. The way the new parser works it really wants to know the actual structure of the value, it doesn't have a good way to treat it as generic. We could still add 'value' and just set it to the appropriate value, but there isn't currently a way to parse an unknown MIME header.

    I think maybe I'll postpone this 'value' idea until we have a bit more experience with the API.

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    Understood. I wonder where in the documentation the ability to get the content disposition should wind up? I am almost tempted to suggest a get_content_disposition() method that parallels get_content_type(), mostly to avoid having to document the asymmetry between how users should go about getting these two pieces of information. :)

    bitdancer commented 10 years ago

    Well, that's why is_attachment exists. I wouldn't be averse to adding get_content_disposition if nobody objects, though.

    The attributes are on the headers because the data really is attributes of the parsed headers, but the more useful user API is the methods on the message object those headers are part of. Originally I thought the header attributes could replace the message object methods, but the more I actually used that interface the less I liked it :). So now I consider them more of an implementation or lower-level-of-the-model detail.

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    I agree that is_attachment supports the most common use-case of people who need to inspect the content disposition!

    But people implementing heavyweight tools and email clients might additionally need to distinguish between a MIME part whose disposition is explicitly "inline" and a MIME part whose disposition is simply unspecified — since the RFC seems to allow clients quite a bit of freedom in the case where it is entirely unspecified:

    "Content-Disposition is an optional header field. In its absence, the MUA may use whatever presentation method it deems suitable." — RFC 2183

    And a three-possibility 'inline'|'attachment'|None return value from get_content_disposition() would perfectly reflect the three possibilities envisioned in the standard.

    bitdancer commented 10 years ago

    OK. If you would be willing to open a feature request for that, that would be great.

    7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

    Thanks — done! http://bugs.python.org/issue21083

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset 0044ed0af96f by R David Murray in branch '3.4': bpo-21079: is_attachment now looks only at the value, ignoring parameters. https://hg.python.org/cpython/rev/0044ed0af96f

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset 54392c4a8880 by R David Murray in branch 'default': Merge: bpo-21079: is_attachment now looks only at the value, ignoring parameters. https://hg.python.org/cpython/rev/54392c4a8880

    serhiy-storchaka commented 10 years ago

    Following expression looks more clear to me:

    c_d is not None and c_d.content_disposition == 'attachment'