jstedfast / MailKit

A cross-platform .NET library for IMAP, POP3, and SMTP.
http://www.mimekit.net
MIT License
6.21k stars 822 forks source link

HtmlPreviewVisitor - include inline images in "attachments" property instead of embedding in HTML #1122

Closed rivdiv closed 3 years ago

rivdiv commented 3 years ago

I am using the HtmlPreviewVisitor sample code to get the complete body text of an email ("convenience" properties of BodyText, BodyHTML and Attachments properties are not doing the job).

I understand the HtmlPreviewVisitor class sample code was created to return the complete HTML, including inline images.

My goal: I am looking to tweak this so that instead of inline images being saved and displayed in the HTML, they would be added to the public property of "attachments", just like the regular attachments.

I started off by commenting out the HtmlTagCallback = HtmlTagCallback line when initializing the HtmlToHtml class but I don't know how where to continue with this.

Any help is appreciated.

jstedfast commented 3 years ago

The HtmlTagCallback wouldn't be the right thing to disable since all it does is rewrite the HTML to point to the attachment content so that a BrowserControl can render them.

What you actually need to do is to modify the MultipartRelated handler to add its children (other than the root) to the list of attachments.

class AttachmentCollectionVisitor : MimeVisitor
{
    List<MimeEntity> attachments = new List<MimeEntity> ();
    TextPart body;

    public AttachmentCollectionVisitor ()
    {
    }

    /// <summary>
    /// The list of attachments that were in the MimeMessage.
    /// </summary>
    public IList<MimeEntity> Attachments {
        get { return attachments; }
    }

    /// <summary>
    /// The TextPart that contains most faithful representation of the message body.
    /// </summary>
    public TextPart HtmlBody {
        get { return body; }
    }

    protected override void VisitMultipartAlternative (MultipartAlternative alternative)
    {
        // walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
        for (int i = alternative.Count - 1; i >= 0 && body == null; i--)
            alternative[i].Accept (this);
    }

    protected override void VisitMultipartRelated (MultipartRelated related)
    {
        var root = related.Root;

        // visit the root document
        root.Accept (this);

        foreach (var child in related) {
            if (child != root)
                child.Accept (this);
        }
    }

    protected override void VisitTextPart (TextPart entity)
    {
        if (body != null) {
            // since we've already found the body, treat this as an attachment
            attachments.Add (entity);
            return;
        }

        body = entity;
    }

    protected override void VisitTnefPart (TnefPart entity)
    {
        // extract any attachments in the MS-TNEF part
        attachments.AddRange (entity.ExtractAttachments ());
    }

    protected override void VisitMessagePart (MessagePart entity)
    {
        // treat message/rfc822 parts as attachments
        attachments.Add (entity);
    }

    protected override void VisitMimePart (MimePart entity)
    {
        // realistically, if we've gotten this far, then we can treat this as an attachment
        // even if the IsAttachment property is false.
        attachments.Add (entity);
    }
}
rivdiv commented 3 years ago

Thanks for the quick response. I can confirm the inline attachments are included now.

However, the HtmlBody property does not include the full text. I am finding a TextPart within the attachments list. Would simply concatenating the TextPart.Text of these attachments to the HtmlBody property give the accurate body text?

jstedfast commented 3 years ago

That all depends on the mail client that constructed the message.

Rendering messages is a "complicated mess" :-\

My FAQ describes the most common message structures that mail clients use, but it's possible for some mail clients to construct messages such that the message text is split among multiple text parts (but that's extremely rare I think).

rivdiv commented 3 years ago

Understood. I think I am dealing with one such rare email message then.

I ended up adding 2 string properties to the class for BodyText and BodyHTML, and modified the VisitTextPart method as follows:

    protected override void VisitTextPart(TextPart entity)
    {
        if (body != null)
        {
            // since we've already found the body, treat this as an attachment
            attachments.Add(entity);
            // return;
        }
        else
        {
            body = entity;
        }

        if (entity.ContentDisposition == null)
        { 
            if (entity.IsHtml == true)
            {
                bodyHTML += entity.Text;
            }
            else
            {
                bodyText += entity.Text;
            }
        }
    }

Thanks again for the consistent quick and detailed responses on here and other forums / threads. It makes the library easy to work with with all the info you put out there.