mgeeky / decode-spam-headers

A script that helps you understand why your E-Mail ended up in Spam
MIT License
559 stars 79 forks source link

Received - Mail Servers Flow" failed: Invalid version: 'F07EBDD638 #12

Open jdghub opened 1 year ago

jdghub commented 1 year ago

After working through a long series of headers in an email manually today I came across this project, and ran it on them (attached) to see what it would find. However I ran into an error and a couple issues:

Sample-Headers.txt

  1. It reported an error for an internal hostname along the route:
    "host": "89b23fcea35b",
    "host2": "ddclpnotapi03",
    "ip": "10.53.192.180",
    "timestamp": "2023-02-02 23:08:32+00:00",
    "ver": "F07EBDD638",
    "with": "ESMTP",
    "extra": [
        "Postfix",
        "Hostname exposed: 89b23fcea35b"
    ],
    "num": 1,
    "parsed": {
        "from": "89b23fcea35b (ddclpnotapi03 [10.53.192.180])",
        "by": "notification.payments.interac.ca (Postfix)",
        "with": "ESMTP",
        "id": "F07EBDD638",
        "for": "<Recipient@RecipientDomain.com>"
    },
    "_raw": "from 89b23fcea35b (ddclpnotapi03 [10.53.192.180]) by notification.payments.interac.ca (Postfix) with ESMTP
    id F07EBDD638 for <Recipient@RecipientDomain.com>; Thu,  2 Feb 2023 18:08:32 -0500 (EST)",
    "by": "notification.payments.interac.ca",
    "id": "F07EBDD638"
    } -->
    [ERROR] Test 1: "Received - Mail Servers Flow" failed: Invalid version: 'F07EBDD638' . Use --debug to show entire stack
    trace.

I don't know if anything was excluded from the output report due to this.

  1. Not fatal, but it identified as domains items in headers that aren't:

    - Found Domain:   15.20.6064.24
    - Found Domain:   6.0.562
    - Found Domain:   2.0.219
    - Found Domain:   _Part_16536292_807372936.1675379312980
    - Found Domain:   36.9663
    - Found Domain:   8.12
    - Found Domain:   15.1.2507.17
    - Found Domain:   17.11.122
    - Found Domain:   15.01.2507.017
    - Found Domain:   00.3940871
    - Found Domain:   15.20.6064.27
    - Found Domain:   15.20.6064.25
    - Found Domain:   18.0.930
  2. Not a big deal, but it added an unbalanced </font> tag for other found domains:

    - Found Domain:   MN2PR15CA0012.outlook.office365</font>.com
    - Found Domain:   microsoft</font>.com
    - Found Domain:   acxsys.onmicrosoft</font>.com
    - Found Domain:   YT2PR01CA0021.outlook.office365</font>.com
    - Found Domain:   mx.microsoft</font>.com

But even with these issues the analysis of the spam headers completed and was useful. Thanks.

mattrobns commented 1 year ago

Having this same problem. image

gnanet commented 11 months ago

These are postfix queueID-s, which are mis-interpreted as version id of a Microsoft product.

I am not experienced in Python, that makes me harder to find the point, where i could add an exception for the postfix line.

This is in one single line of a postfix Received line,

(Postfix) with ESMTPSA id 0CDFFC0EC0

The Microsoft servers have these lines, and the difference is obvious:

with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.21 via Frontend Transport
with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19 via Frontend Transport
with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28
with HTTPS
gnanet commented 11 months ago

I was able to hard code a condition, to match Postfix in the Received "by" part, and remove the Queue-ID from the version checks, just above the part that copies 'id' to obj['ver']

        if 'by' in obj['parsed'].keys():
            self.logger.dbg('Parsed Received-By: ' + str(obj['parsed']['by']))
            if "Postfix" in str(obj['parsed']['by']):
                self.logger.info(f'Found Postfix in received by...')
                del parsed['id']

This is by no means a "solution", because it would rather need a parser that can identify postfix/sendmail/exim/qmail at least, but my short research showed,thats not an easy task, sendmail often placing its version only in the parentheses in 'by', exim is using the 'with' part.

@mgeeky please note, this issues is similar to #1

mjf commented 6 months ago

I temporarily fixed this by commenting out line 2104:

@@ -2104,7 +2104,7 @@ class SMTPHeadersAnalysis:
             if ver.version == lookup:
                 return ver

-        lookupparsed = packaging.version.parse(lookup)
+#        lookupparsed = packaging.version.parse(lookup)

         # Go with version-wise comparison to fuzzily find proper version name
         sortedversions = sorted(SMTPHeadersAnalysis.Exchange_Versions)

Is this repo alive? There is some PR pending etc. :disappointed:

alexminza commented 3 months ago

Proposed patch to skip id's that do not resemble an Exchange version string:

Screenshot 2024-07-29 at 15 51 17 Screenshot 2024-07-29 at 15 50 30