jstedfast / MimeKit

A .NET MIME creation and parser library with support for S/MIME, PGP, DKIM, TNEF and Unix mbox spools.
http://www.mimekit.net
MIT License
1.82k stars 369 forks source link

Parsing Attached mail in rtf format #344

Closed Astol closed 7 years ago

Astol commented 7 years ago

I am writing a program to gather all attached pdf files to emails, parsing them with mimekit. And I have run into a problem with the case of rtf formated mails. I can get the different parts contained in the ms-tnef fine, but when there is an email attachement inside I can't find a way to handle it. In my case the email attachment is also in rtf format. Also it's hard to identify that the mail is, in fact, a mail. Here is how the object looks at runtime: { Content - Type: application / octet - stream; name *= iso - 8859 - 1 '' H % E4r % 20 % E4r % 20 pdf % 20 nr % 202 % 20 att % 20 skicka % 20 vidare Content - Disposition: attachment; filename = "Untitled Attachment"; modification - date = "Thu, 28 Sep 2017 10:35:47 +0200"; size = 25289 Content - Transfer - Encoding: base64 eJ8 + Ii8IAQaQCAAEAAAAAAABAAEAAQ ... + g8BAAAA EAAAABJ4DEdIZ7NAn6T9kixhoFEDAP4PBwAAAAw0AAA = } MimeKit.MimePart: { Content - Type: application / octet - stream;name *= iso - 8859 - 1 '' H % E4r % 20 % E4r % 20 pdf % 20 nr % 202 % 20 att % 20 skicka % 20 vidare Content - Disposition: attachment;filename = "Untitled Attachment";modification - date = "Thu, 28 Sep 2017 10:35:47 +0200";size = 25289 Content - Transfer - Encoding: base64 eJ8 + Ii8IAQaQCAAEAAAAAAABAAEAAQeQBgAIAAAA5AQAAAAAAADo ... + g8BAAAA EAAAABJ4DEdIZ7NAn6T9kixhoFEDAP4PBwAAAAw0AAA = }

Sorry if I'm missing something!

jstedfast commented 7 years ago

Are you saying that the content of this "Untitled Attachment" is a message? I don't really understand what you are asking or trying to do.

Astol commented 7 years ago

Yes, the untitled attachment is a message. The original message I am trying to parse is an email, that has an email as an attachment. But because they are both in rtf format it gets a bit messy.

  1. I parse the original message, and see that it has an application/ms-tnef part
  2. I use ExtractAttachments on the TnefPart and continue to parse the attachments
  3. I find the attached email, the "Untitled Attachment" message, which displays as above.

I can't find any suitable means as to how to continue parsing the email attachment. Did that make it clearer?

Astol commented 7 years ago

Something like this, but I can't get mimekit to handle the rtf formatted mail attachment.

If mime_part.ContentType.MediaSubtype = "ms-tnef" And mime_part.ContentType.Name = "winmail.dat" Then
            Dim multi As Tnef.TnefPart = DirectCast(mime_part, MimeKit.Tnef.TnefPart)
            Dim parts As System.Collections.IEnumerable = multi.ExtractAttachments()

            For Each part As MimeEntity In parts
                parse_mail(part, filepath)
            Next

        ElseIf mime_part.ContentType.MediaSubtype = "octet-stream" Then
            Try
                Dim part As MimeKit.MessagePart = DirectCast(mime_part, MimeKit.MessagePart)
                parse_mail(part.Message.Body, filepath)
            Catch ex As Exception
                Debug.Print(ex.Message)
            End Try
        End If

    End If

    If mime_part.ContentType.MediaType = "message" Then
        Dim part As MimeKit.MessagePart = DirectCast(mime_part, MimeKit.MessagePart)
        parse_mail(del.Message.Body, filepath)
    End If
jstedfast commented 7 years ago

So where you are going wrong is that you cannot cast a MimePart to a MessagePart. MessagePart does not inherit from MimePart, it inherits from MimeEntity.

What you need to do is decode the content of the MimePart and then use MimeKit.Tnef.TnefReader to parse the decoded content manually.

As a cheat, you could do this (and bear with me because I don't know VB.NET):

var tnef = new TnefPart ();
tnef.ContentObject = mime_part.ContentObject;

// now you can use tnef.ExtractAttachments()

As far as the rest of your code, it is not very safely written.

It would be better to check if the mime_part is a TnefPart instead of checking the ContentType properties. Also, a tnef part might not have a name of "winmail.dat".

In c#, you would do this like this:

if (mime_part is TnefPart) {
    var multi = (TnefPart) mime_part;
    ...
}

if (mime_part is MessagePart) {
    var part = (MessagePart) mime_part;
}
Astol commented 7 years ago

Thanks, works a lot better! Although I'm getting an exception from MimeKit when I run my code. Am I handling the ExctractAttachments() wrong? It also reads the attachment twice. Exception: A first chance exception of type 'System.IO.EndOfStreamException' occurred in MimeKit.dll

 If TypeOf mime_part Is Tnef.TnefPart Then
                    Dim tnef As Tnef.TnefPart = mime_part
                    Dim parts As IEnumerable(Of MimeEntity) = tnef.ExtractAttachments()

                    For Each node As MimeEntity In parts
                        parse_mime(node, filepath)
                    Next

                Else
'Because sometimes messages attachments are hard to tell apart
                    Try
                        Dim tnef As Tnef.TnefPart = New Tnef.TnefPart()
                        tnef.ContentObject = del.ContentObject

                        Dim partsAs IEnumerable(Of MimeEntity) = tnef.ExtractAttachments()
                        For Each node As MimeEntity In parts
                             parse_mime(node, filepath)
                        Next
                    Catch ex As Exception
                    End Try
                End If
Astol commented 7 years ago

Forgot to say that the exception doesn't actully break the program, it just shows up in the Immediate Window output

Astol commented 7 years ago

testmail.txt this is the test case I am using

Astol commented 7 years ago

The type of the object returned by the ExtractAttachments() operations becomes "MimeKit.Tnef.TnefPart+d__7", which feels odd

jstedfast commented 7 years ago

What is del? Why shouldn't it be mime_part?

 If TypeOf mime_part Is Tnef.TnefPart Then
                    Dim tnef As Tnef.TnefPart = mime_part
                    Dim parts As IEnumerable(Of MimeEntity) = tnef.ExtractAttachments()

                    For Each node As MimeEntity In parts
                        parse_mime(node, filepath)
                    Next

                Else
'Because sometimes messages attachments are hard to tell apart
                    Try
                        Dim tnef As Tnef.TnefPart = New Tnef.TnefPart()
                        tnef.ContentObject = mime_part.ContentObject

                        Dim partsAs IEnumerable(Of MimeEntity) = tnef.ExtractAttachments()
                        For Each node As MimeEntity In parts
                             parse_mime(node, filepath)
                        Next
                    Catch ex As Exception
                    End Try
                End If
Astol commented 7 years ago

Yes it is the mime_part in the code, it's originally not written in English so i translated it so it wouldn't look like gibberish before pasting it here, just missed to rename it, sorry!

Astol commented 7 years ago

The exception inside mimekit seems to always happen on the first iteration while looping tnef.ExtractAttachments()

jstedfast commented 7 years ago

I wrote a simple test program to print out the inner-most text/* parts from your sample message:

using System;
using System.Linq;

using MimeKit;
using MimeKit.Tnef;

namespace TnefTest
{
    class Program
    {
        public static void Main (string[] args)
        {
            var message = MimeMessage.Load ("testmail.txt");
            var tnef = message.BodyParts.OfType<TnefPart> ().FirstOrDefault ();

            foreach (var attachment in tnef.ExtractAttachments ()) {
                if (attachment is MimePart) {
                    var tnef2 = new TnefPart ();
                    tnef2.ContentObject = ((MimePart) attachment).ContentObject;
                    foreach (var attachment2 in tnef2.ExtractAttachments ()) {
                        var mime_part = attachment2 as MimePart;
                        var text = attachment2 as TextPart;

                        if (text != null) {
                            Console.WriteLine ("Content-Type: {0}", text.ContentType.MimeType); 
                            Console.WriteLine (text.Text);
                        }
                    }
                }
            }
        }
    }
}

Here are the results (I did not get any exceptions):

Content-Type: text/plain

Content-Type: text/plain
Testing rtf 

Content-Type: text/rtf
{\rtf1\ansi\ansicpg1252\fromtext \fbidis \deff0{\fonttbl
{\f0\fswiss Arial;}
{\f1\fmodern Courier New;}
{\f2\fnil\fcharset2 Symbol;}
{\f3\fmodern\fcharset0 Courier New;}}
{\colortbl\red0\green0\blue0;\red0\green0\blue255;}
\uc1\pard\plain\deftab360 \f0\fs20 Testing rtf\objattph  \par
}
Content-Type: text/plain
Testing rtf 

Content-Type: text/rtf
{\rtf1\ansi\ansicpg1252\fromtext \fbidis \deff0{\fonttbl
{\f0\fswiss Arial;}
{\f1\fmodern Courier New;}
{\f2\fnil\fcharset2 Symbol;}
{\f3\fmodern\fcharset0 Courier New;}}
{\colortbl\red0\green0\blue0;\red0\green0\blue255;}
\uc1\pard\plain\deftab360 \f0\fs20 Testing rtf\objattph  \par
}
firat-plutoflume commented 1 month ago

Hello @jstedfast,

I was trying to parse tnef body parts, and came accross this example(testmail.txt) above - I see there is an email attached inline to the tnef body part, and I am trying to recurse down and parse it as a message. I believe I can use tnefPart.ConvertToMessage() for this. However, this method doesn't fail for TnefPart's that isn't an email. What is the best way to determine if this part is actually an email and only then do this conversion?

Thanks in advance

jstedfast commented 1 month ago

The ConvertToMessage() method is meant to work for all TNEF data and MimeMessage is the closest data structure that MimeKit has that can represent most of the TNEF data that is available.

I see there is an email attached inline to the tnef body part, and I am trying to recurse down and parse it as a message ... What is the best way to determine if this part is actually an email and only then do this conversion?

I guess that depends on what you would consider to be "actually an email". TNEF attachments are never actually an email.

I would recommend taking a look at the ConvertToMessage() implementation in TnefPart.cs and deciding what TNEF attributes you'd consider indicate an "email" and then check for those.

firat-plutoflume commented 1 month ago

thanks, these attributes will only be available once I do the conversion as I understand. is there a way to check beforehand? (trying to optimize my logic as much as possible)

jstedfast commented 1 month ago

If you use the TnefReader directly, you could check as it parses as opposed to calling ConvertToMessage().