Closed 7f693efa-c293-4e08-80e6-4c8268e300a5 closed 10 years ago
Most attachments (in my inbox, at least) specify a filename, and thus have a Content-Disposition header that looks like:
Content-Disposition: attachment; filename="attachment.gz"
In fact, this sample header was generated by the new add_attachment() method in Python itself. Unfortunately, the is_attachment property currently does this test:
c_d.lower() == 'attachment'
Which means that it returns False for almost all attachments in my email archive. I believe that the test should instead be:
c_d.split(';', 1)[0].lower() == 'attachment'
Oh - this also, happily, explains why iter_attachments() is ignoring all of the attachments on my email: because it internally relies upon is_attachment to make the decision. So this fix will also make iter_attachments() usable!
Okay, having looked at the source a bit more it would probably make more sense to use _splitparam() instead of doing the split manually.
Given that methods like get_param() already exist for pulling data out of the right-hand-side of the ';' in a parameterized email header, would it be amiss for EmailMessage to also have a method that either returns everything to the left of the semicolon, or returns something like:
('attachment', [('filename', 'example.txt')])
thus doing all the parsing in one place that everything else can then steadily rely upon, including users that want to pull the parsed values for their own inspection?
That facility already mostly exists. The bug is that the code in question doesn't use it.
>>> m['Content-Disposition'].content_disposition
'attachment'
>>> m['Content-Disposition'].params
{'filename': 'attachment.gz'}
On the other hand, looking at that it is obvious there should be a generic 'value' attribute on all MIME headers so that that could be written:
m['Content-Disposition'].value == 'attachment'
where value would be the 'canonical form' of the value for that header when there is one, including normalizing the case. Some headers still want specialized attributes (Content-Type's maintype and subtype, for example), but there is always the value/params split, and that ought to be accessible generically and currently isn't.
This is why this stuff is still provisional :)
Here's patch.
I was going to open new issue for adding 'value', but looking at the parsing code I see why I didn't add one. The way the new parser works it really wants to know the actual structure of the value, it doesn't have a good way to treat it as generic. We could still add 'value' and just set it to the appropriate value, but there isn't currently a way to parse an unknown MIME header.
I think maybe I'll postpone this 'value' idea until we have a bit more experience with the API.
Understood. I wonder where in the documentation the ability to get the content disposition should wind up? I am almost tempted to suggest a get_content_disposition() method that parallels get_content_type(), mostly to avoid having to document the asymmetry between how users should go about getting these two pieces of information. :)
Well, that's why is_attachment exists. I wouldn't be averse to adding get_content_disposition if nobody objects, though.
The attributes are on the headers because the data really is attributes of the parsed headers, but the more useful user API is the methods on the message object those headers are part of. Originally I thought the header attributes could replace the message object methods, but the more I actually used that interface the less I liked it :). So now I consider them more of an implementation or lower-level-of-the-model detail.
I agree that is_attachment supports the most common use-case of people who need to inspect the content disposition!
But people implementing heavyweight tools and email clients might additionally need to distinguish between a MIME part whose disposition is explicitly "inline" and a MIME part whose disposition is simply unspecified — since the RFC seems to allow clients quite a bit of freedom in the case where it is entirely unspecified:
"Content-Disposition is an optional header field. In its absence, the MUA may use whatever presentation method it deems suitable." — RFC 2183
And a three-possibility 'inline'|'attachment'|None return value from get_content_disposition() would perfectly reflect the three possibilities envisioned in the standard.
OK. If you would be willing to open a feature request for that, that would be great.
Thanks — done! http://bugs.python.org/issue21083
New changeset 0044ed0af96f by R David Murray in branch '3.4': bpo-21079: is_attachment now looks only at the value, ignoring parameters. https://hg.python.org/cpython/rev/0044ed0af96f
New changeset 54392c4a8880 by R David Murray in branch 'default': Merge: bpo-21079: is_attachment now looks only at the value, ignoring parameters. https://hg.python.org/cpython/rev/54392c4a8880
Following expression looks more clear to me:
c_d is not None and c_d.content_disposition == 'attachment'
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['expert-email']
title = 'EmailMessage.is_attachment == False if filename is present'
updated_at =
user = 'https://bugs.python.org/brandon-rhodes'
```
bugs.python.org fields:
```python
activity =
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date =
closer = 'r.david.murray'
components = ['email']
creation =
creator = 'brandon-rhodes'
dependencies = []
files = ['34641']
hgrepos = []
issue_num = 21079
keywords = ['patch']
message_count = 15.0
messages = ['214969', '214970', '214972', '214974', '214978', '214990', '214995', '214997', '215002', '215004', '215014', '215037', '227181', '227182', '227221']
nosy_count = 5.0
nosy_names = ['barry', 'r.david.murray', 'brandon-rhodes', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue21079'
versions = ['Python 3.4', 'Python 3.5']
```