Error with Attachment filename

tomastyser commented 9 years ago

I have email, with multi line filename and each line has encoding. S22 imap then parse error the filename. the source of email: Content-Type: application/msword; name= "=?iso-8859-2?Q?Veolia_-_zm=ECna_um=EDst=ECn=ED_vodom=ECru=2CHlubo=E8epy.?= =?iso-8859-2?Q?doc?=" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename= "=?iso-8859-2?Q?Veolia-_zm=ECna_um=EDst=ECn=ED_vodom=ECru=2C_Hlubo=E8epy.?= =?iso-8859-2?Q?doc?="

tomastyser commented 9 years ago

i solve the problem:

static Attachment CreateAttachment(Bodypart part, byte[] bytes) { .... // Workaround: filename from Attachment constructor is ignored with Mono. /my code/ if (name.Contains("\t")) { string[] names = name.Split('\t'); name = ""; foreach (string n in names) { attachment.Name = n; name += attachment.Name; } } /my code/ attachment.Name = name; attachment.ContentDisposition.FileName = name; return attachment; }

TheZxcv commented 2 years ago

Hi All,

I know this project is probably dead but I recently stumbled upon this issue and I figured the root cause so I wanted to document it somewhere.

Basically when the Attachment class decodes the filename of the attachment, it tries to extract the encoding in a way that fails when there are multiple encoded-words on the same line. The encoded-words end up in one line because in ParseMailHeader all new lines (CRLF) are stripped off.

Attachment name setter: https://github.com/microsoft/referencesource/blob/master/System/net/System/Net/mail/Attachment.cs#L372

Encoding extraction: https://github.com/microsoft/referencesource/blob/master/System/net/System/Net/mail/MimeBasePart.cs#L94

I suspect this is due to an ambiguity (IMO) in section 2 of the RFC2047 where it's stated that (emphasis mine)

An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used.

While in the rest of the document they are said to be separated by a linear-white-space which it's defined in RFC822 as an optional CRLF followed by a SPACE.

I think this issue can be resolved by either:

Keeping all new lines in the headers and so allowing the extraction of the encoding; although it would break in other cases that are not correctly handled as highlighted by https://github.com/smiley22/S22.Imap/issues/55#issuecomment-20238939;
Pre-emptively decoding the attachment name before initialising the Attachment class using Util.DecodeWords as it handles multiple encoded-words more gracefully.

Related to issues #28 and #55.

smiley22 / S22.Imap

Error with Attachment filename #105