koodaamo / tnefparse

a TNEF decoding library written in python, without external dependencies
GNU Lesser General Public License v3.0
49 stars 37 forks source link

Extract MAPI_ATTACH_METHOD == 5 parts? #74

Closed albrechtd closed 3 years ago

albrechtd commented 3 years ago

I got a few (malicious) messages with winmail.dat attachments which tnefparse cannot extract. The JSON dump produced by it contains, inter alia

    "attachments": [
        {
            "MAPI_ATTACH_METHOD": 5,
            "MAPI_CREATION_TIME": "2020-10-10 14:57:44.056806",
            "MAPI_LANGUAGE": "EnUs",
            "MAPI_LAST_MODIFICATION_TIME": "2020-10-10 14:57:44.056806",
            "MAPI_RENDERING_POSITION": -1,
            "data_len": 1,
            "filename": "",
            "long_filename": ""
        }
    ],

However, the part is ~170 kBytes, and running unzip on it extracts at least parts parts of a (malware) M$ Office document starting at offset ~32 kBytes. I found a note that MAPI attach method 5 indicates a “message”, whatever this means in this context. Is it generally impossible to extract such items (also, neither tnef nor ytnef-tools produce usable output), or would it be possible to add it to tnefparse?

Thanks in advance, Albrecht.

petri commented 3 years ago

Could you provide a link to the note you found please? Also, please clarify; you say tnefparse cannot extract the part - yet you indicate you have been somehow able to extract it since you know it's size and were able to unzip it? How did you extract it?

albrechtd commented 3 years ago

Could you provide a link to the note you found please?

I found this information: http://www.mimekit.net/docs/html/T_MimeKit_Tnef_TnefAttachMethod.htm – unfortunately, there are no references where these constants are actually specified. However, as the author has a microsoft.com email address, this might be an indication that he has more information than available to the public…

Also, please clarify; you say tnefparse cannot extract the part - yet you indicate you have been somehow able to extract it since you know it's size and were able to unzip it? How did you extract it?

Umm, sorry, my description was somewhat short…

The file I try to open is a winmail.dat attachment of a RFC 5322 message. Saving it and running

tnef --number-backups --save-body=tnef-body --body-pref=ALL --ignore-checksum \
    --ignore-encode --ignore-cruft -C $(pwd)/out_tnef winmail.dat

produces a single output file tnef-tmp which file classifies as “data” (application/octet-stream).

Apparently, as no attachment file names are present (see the output in my original post), running tnefparse for extracting the attachments ejects with an error:

albrecht@deneb:~/Work$ /home/albrecht/.local/bin/tnefparse -a -p $(pwd)/out_tnefparse winmail.dat
Traceback (most recent call last):
  File "/home/albrecht/.local/bin/tnefparse", line 10, in <module>
    sys.exit(tnefparse())
  File "/home/albrecht/.local/lib/python2.7/site-packages/tnefparse/cmdline.py", line 95, in tnefparse
    with open(pth + a.long_filename(), "wb") as afp:
IOError: [Errno 21] Is a directory: u'/home/albrecht/Work/out_tnefparse/'

Running unzip on winmail.dat apparently finds the M$ OOXML malware:

albrecht@deneb:~/Work$ unzip -l winmail.dat
Archive:  winmail.dat
warning [winmail.dat]:  33833 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  Length      Date    Time    Name
---------  ---------- -----   ----
     2010  1980-01-01 00:00   [Content_Types].xml
[…]

Running unzip on tnef-tmp produced by calling tnef also finds the document, just with a different offset.

jrideout commented 3 years ago

@albrechtd can you post the winmail.dat file somewhere so we can investigate?

RossPatterson commented 3 years ago

I have a similar case, but instead of malicious code, what was sent was an Outlook distribution list (an IPM.DistList). tnefparse -d filename shows the attachment:

{
    "attachments": [
        {
            "MAPI_ATTACHMENT_CONTACT_PHOTO": false,
            "MAPI_ATTACHMENT_FLAGS": 0,
            "MAPI_ATTACHMENT_HIDDEN": false,
            "MAPI_ATTACHMENT_LINK_ID": 0,
            "MAPI_ATTACH_ENCODING": "",
            "MAPI_ATTACH_FLAGS": 0,
            "MAPI_ATTACH_METHOD": 5,
            "MAPI_ATTACH_NUM": 1005509,
            "MAPI_ATTACH_SIZE": 24611,
            "MAPI_DISPLAY_NAME": "FOSLnews",
            "MAPI_EXCEPTION_END_TIME": "4501-01-01 00:00:00",
            "MAPI_EXCEPTION_START_TIME": "4501-01-01 00:00:00",
            "MAPI_MAPPING_SIGNATURE": ";\ufffd2a\ufffd\ufffd\fB\ufffdf\ufffd\u0000\ufffdP\ufffd>",
            "MAPI_OBJECT_TYPE": 7,
            "MAPI_RENDERING_POSITION": -1,
            "MAPI_STORE_RECORD_KEY": ";\ufffd2a\ufffd\ufffd\fB\ufffdf\ufffd\u0000\ufffdP\ufffd>",
            "MAPI_STORE_SUPPORT_MASK": 245710845,
            "MAPI_STORE_UNICODE_MASK": 245710845,
            "data_len": 1,
            "filename": "Untitled Attachment",
            "long_filename": "Untitled Attachment"
        }
    ],
    "attributes": {
        "Message Class": "IPM.Microsoft Mail.Note",
        "Response Requested": false
    },
    "extended_attributes": {
        "0x85d8": "IPM.DistList",
        "MAPI_AGING_DONT_AGE_ME": false,
        "MAPI_ALTERNATE_RECIPIENT_ALLOWED": true,
        "MAPI_CONVERSATION_INDEX": "\u0001\ufffd.\ufffd+\ufffd\ufffd-\ufffd \ufffd\u0013L\ufffd\ufffd\rxz\ufffdhl{\ufffdh\ufffd\ufffd0",
        "MAPI_CONVERSATION_TOPIC": ### sensitive data elided for posting
        "MAPI_DELETE_AFTER_SUBMIT": false,
        "MAPI_DEPARTMENT": false,
        "MAPI_EMAIL1ORIGINAL_ENTRY_ID": 0,
        "MAPI_HAS_PICTURE": 0,
        "MAPI_INTERNET_CODEPAGE": 20127,
        "MAPI_INTERNET_MAIL_OVERRIDE_FORMAT": 1441792,
        "MAPI_MAPPING_SIGNATURE": ";\ufffd2a\ufffd\ufffd\fB\ufffdf\ufffd\u0000\ufffdP\ufffd>",
        "MAPI_MDB_PROVIDER": "NITA\ufffd\ufffd\ufffd\u0001\u0000\ufffd\u00007\ufffdn",
        "MAPI_MESSAGE_EDITOR_FORMAT": 2,
        "MAPI_MESSAGE_LOCALE_ID": 1033,
        "MAPI_NEXT_SEND_ACCT": ### sensitive data elided for posting
        "MAPI_OBJECT_TYPE": 5,
        "MAPI_ORIGINAL_AUTHOR_NAME": "",
        "MAPI_ORIGINAL_SENSITIVITY": 0,
        "MAPI_PRIMARY_SEND_ACCOUNT": ### sensitive data elided for posting
        "MAPI_PRIORITY": 0,
        "MAPI_PRIVATE": false,
        "MAPI_READ_RECEIPT_REQUESTED": false,
        "MAPI_RECIPIENT_REASSIGNMENT_PROHIBITED": false,
        "MAPI_REPLY_REQUESTED": false,
        "MAPI_RTF_IN_SYNC": true,
        "MAPI_RTF_SYNC_BODY_COUNT": 113,
        "MAPI_RTF_SYNC_BODY_CRC": -518663970,
        "MAPI_RTF_SYNC_BODY_TAG": ### sensitive data elided for posting
        "MAPI_RTF_SYNC_PREFIX_COUNT": 0,
        "MAPI_RTF_SYNC_TRAILING_COUNT": 0,
        "MAPI_SENTMAIL_ENTRYID": "\u0000\u0000\u0000\u0000;\ufffd2a\ufffd\ufffd\fB\ufffdf\ufffd\u0000\ufffdP\ufffd>\ufffd\ufffd",
        "MAPI_STORE_RECORD_KEY": ";\ufffd2a\ufffd\ufffd\fB\ufffdf\ufffd\u0000\ufffdP\ufffd>",
        "MAPI_STORE_SUPPORT_MASK": 245710845,
        "MAPI_STORE_UNICODE_MASK": 245710845,
        "MAPI_SUBMIT_FLAGS": 1,
        "MAPI_TASK_MODE": 0,
        "MAPI_TNEF_CORRELATION_KEY": "000000003BCC3261A79C0C42A866E2008E50C03E049E6A00",
        "MAPI_USE_TNEF": true
    }
}

but tnefparse -a filename gets an odd error:

Traceback (most recent call last):
  File "<pyshell#36>", line 1, in <module>
    tnefparse.cmdline.tnefparse(['-a', 'ross_file'])
  File "C:\Ross\Source\tnefparse\tnefparse\cmdline.py", line 101, in tnefparse
    afp.write(a.data)
TypeError: a bytes-like object is required, not 'list'

a.data has somehow become a single-element array. The actual attachment appears to be in a.data[0]

The original file is safe to share (it's not malicious), so I can post it somewhere (dunno where to put a binary file, though).

jrideout commented 3 years ago

@RossPatterson thanks, can you create a PR adding the file to https://github.com/koodaamo/tnefparse/tree/master/tests/examples ?

RossPatterson commented 3 years ago

Done. Please merge.

Ross

On Thu, Jan 14, 2021 at 10:17 AM Jacob Rideout notifications@github.com wrote:

@RossPatterson https://github.com/RossPatterson thanks, can you create a PR adding the file to https://github.com/koodaamo/tnefparse/tree/master/tests/examples ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/koodaamo/tnefparse/issues/74#issuecomment-760262301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADGK2CMKJETOEQC2YZUZKLSZ4DJJANCNFSM4TBGNYTA .

jrideout commented 3 years ago

We'll track the fix in https://github.com/koodaamo/tnefparse/pull/101

jrideout commented 3 years ago

@albrechtd Can you please test your file with the code from this branch #101 ? The json dump should give you the details of the embedded message.

RossPatterson commented 3 years ago

The #101 branch code successfully dumps and extracts the un-redacted distribution list from my case. Attempting to dump the extracted IPM.DistList attachment (via tnefparse -d "Unititled Attachment") gets "Wrong TNEF signature: 0x00020307", but that might be the correct behavior - I have no idea if that file is TNEF, or some other format.

Ross

On Wed, Jan 20, 2021 at 5:37 PM Jacob Rideout notifications@github.com wrote:

@albrechtd https://github.com/albrechtd Can you please test your file with the code from this branch #101 https://github.com/koodaamo/tnefparse/pull/101 ? The json dump should give you the details of the embedded message.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/koodaamo/tnefparse/issues/74#issuecomment-763998583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADGK2EYO32GU3UAKX6URD3S25LMJANCNFSM4TBGNYTA .

jrideout commented 3 years ago

The first 16 bytes are extra metadata defining the "interface" of the file. Presuming they match the magic numbers for a "message interface," then you can presume the remaining data in the file is a valid tnef.

What do you suggest the behavior be for extracting the attachment? Should we just strip the first 16 bytes? This is what I'm doing for the json dump case. The tradeoff is that you are losing information, but I suspect it is better to just get the most parsable file possible.

albrechtd commented 3 years ago

Hi, first, I think I should apologise for not posting my sample – an inspection of the included malware doc revealed that is was actually an Emotet file. This campaign is known for re-using stolen messages for “fake” replies. Therefore, I asked the recipient for permission for sharing the attachment, and she declined as the feared that private information would be publicly available on the internet. Of course, I have to respect that!

@albrechtd Can you please test your file with the code from this branch #101 ? The json dump should give you the details of the embedded message.

Actually, your solution almost does the job. The only issue is that the long_filename() element of the attachment (identified as embedded_message when running tnefparse -d) is empty, leading to a crash. Using the following patch (slightly paranoid, just in case that more than one attachment without a file name exists)

--- tnefparse-master.orig/tnefparse/cmdline.py  2021-01-22 16:11:22.000000000 +0100
+++ tnefparse-master/tnefparse/cmdline.py       2021-01-22 18:19:42.181741987 +0100
@@ -96,8 +96,14 @@

         elif args.attachments:
             pth = args.path.rstrip(os.sep) + os.sep if args.path else ''
+            idx = 1
             for a in t.attachments:
-                with open(pth + a.long_filename(), "wb") as afp:
+                if a.long_filename():
+                    outname = pth + a.long_filename()
+                else:
+                    outname = pth + '__unknown_' + str(idx) + '__'
+                    idx += 1
+                with open(outname, "wb") as afp:
                     afp.write(a.data)
             sys.stderr.write(f"Successfully wrote {len(t.attachments)} files\n")
             sys.exit()

a file __unknown_1__ is dumped which again is a tnef, apparently a complete message. Running tnefparse again on this dump properly extracts the Emotet doc file. Good work! Thanks a lot, Albrecht.