Closed rboen closed 3 years ago
I'm not sure how to "fix" this...
What were you planning to do as a work-around?
My first thought was to check in case of a null HtmlBody/TextBody-Property and no BodyParts given to check the MimeMessage.Body.Preamble Content. If this content will start/end with a boundary like string I would have checked, if this boundary fits to one of the Content-Type-Header records. If so, then take the PreampleContent and try to create a Multipart/Alternative-Message Part and use the MultipartAlternative.GetTextBody function. But it felt rather "hacky". Now I see, that GetTextBody is public and not internal as I thought. So there might be a way to create this workaround.
A second idea is to make a change in GetContentType of the MimeParser class. Instead of finding a content type from the start of the header list, a reverse order might make some sense. This idea is based on the assumption that postprocessing modules like anti virus scanner (which have a higher risk to "fiddle" with the message) will append Header-Values. A least in this example the last Content-Type-Header is the "right one".
e.g. like
ContentType GetContentType (ContentType parent)
{
for (int i = headers.Count-1; i >= 0; i--) {
if (!headers[i].Field.Equals ("Content-Type", StringComparison.OrdinalIgnoreCase))
continue;
...
Just a guess - I can't say if multiple content-type-headers are allowed or occurr often or if their order matters...
Checking the Preamble might work as a workaround in your case...
As far as Content-Type headers, there should only be 1. In your case, it likely was the anti-virus software that generated a new boundary and, instead of replacing the old Content-Type header, it just appended a new one. Oof.
I'm not sure if using the last Content-Type header is necessarily any more likely to work than the first when there are multiple Content-Type headers (in cases other than yours, I mean). I would need more data.
Unfortunately I cannot provide more data. Here is the workaround for this kind of maleformed mime messages.
public string HtmlBody
{
get
{
var htmlBody = _decodedMimeMessage.HtmlBody;
// workaround for maleformed e-mails with multiple Content-Type-Headers of type multipart/alternative
// where only the given boundary of the last Content-Type header is valid.
try
{
if (htmlBody == null && _decodedMimeMessage.BodyParts.FirstOrDefault() == null)
{
var lastContentType = _decodedMimeMessage.Body.Headers.LastOrDefault(h =>
h.Field.Equals("Content-Type", StringComparison.OrdinalIgnoreCase));
if (lastContentType != null && _decodedMimeMessage.Body is MultipartAlternative multipartAlternative)
{
var content = multipartAlternative.Preamble;
if (ContentType.TryParse(new ParserOptions(), lastContentType.RawValue, 0,
lastContentType.RawValue.Length, out var contentType))
{
var encoding = contentType.CharsetEncoding ?? Encoding.UTF8;
using (var bufferStream =
new MemoryStream(encoding.GetBytes(content)))
{
var mimeEntity = MimeEntity.Load(contentType, bufferStream);
htmlBody = (mimeEntity as MultipartAlternative)?.HtmlBody;
}
}
}
}
}
catch
{
// ignore
}
return htmlBody;
}
}
You might find this interesting: https://datatracker.ietf.org/doc/html/rfc7103#section-7.5
Unfortunately, they do not address what to do with multiple Content-Type headers.
I tried searching for multiple Content-Type headers wrt Avast. All I could find so far are these posts:
https://forum.avast.com/index.php?topic=42013.0 https://forum.avast.com/index.php?topic=57720.0 https://forum.avast.com/index.php?topic=64839.0 https://forum.avast.com/index.php?topic=68497.0
They all seem to indicate that Avast emits some sort of error for messages that it finds containing multiple Content-Type headers but nothing about Avast adding a second Content-Type header.
That said, if Avast decided this was Clean (as per the header), then that suggests it probably did add the second Content-Type header? Maybe?
Do you have any control over the Avast settings? Can you turn off any options that tell it to modify the message body?
Sorry, I cannot provide more information. We implemented some kind of collaboration tool, where MimeKit is used to analyze incoming mails from a multitude of different senders/organizations. The email above has been part of a support call and has been sent from outside our organization. Therefore we have no control over the Avast settings nor more emails with multiple content headers. But we will keep our eyes open. For now concerning this special email the workaround solves the issue.
@rboen Okay, thanks. I'll close this for now since you have a work-around that works, but if you find more issues like this, do feel free to reopen this or file a new issue.
Describe the bug I tried to get the HTML or TEXT body of the email below, but the MimeMessage.TextBody and the MimeMessage.HtmlBody returns null.
After some digging into the mail structure I noticed that the Boundary_(ID_EFTfpjUnoKOKbUchO46n4w) is nowhere to be seen in the email but a second boundary in a Content-Type: multipart/alternative header is given. This boundary is present.
The parser seems to grab the first invalid Content-Type-Header and will not get the internal structure.
I assume this is an incorrect mail format, but this seems to happen in the wild (I assume the virus protection messes the mail structure).
A mail client e.g. Outlook is somewhat forgiving and shows the content nonetheless.
I tried to create a workaround but I quickly discovered that some helper classes/methods are declared internal. So I am stuck.
Thank you very much for looking into this.
Expected behavior MimeMessage.HtmlBody / MimeMessage.TextBody should return body content of the mail.
Screenshots
Additional context Add any other context about the problem here.