sisimai / p5-sisimai

Mail Analyzing Interface for email bounce: A Perl module to parse RFC5322 bounce mails and generating structured data as JSON from parsed results. Formerly known as bounceHammer 4: an error mail analyzer.
https://libsisimai.org
BSD 2-Clause "Simplified" License
77 stars 26 forks source link

ReturnPathFBL parsing missing rhosts #415

Closed genericcx closed 3 years ago

genericcx commented 3 years ago

It seems like the ARF/Feedbackloop parser does not detect all of the Returnpath (https://fbl.returnpath.net/) versions.

Example (redacted)

This is a Rackspace Abuse Report for an email message received from domain =
example.com, IP 10.0.0.1, on Wed, 14 Oct 2020 14:00:29 +0000.

--061ac93f6aad9631ee4cf05d779c8f0f9b12bc5ce29b0755e76670ef7737
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Content-Type: message/feedback-report

Version: 1
Arrival-Date: Wed, 14 Oct 2020 14:00:29 +0000
Feedback-Type: abuse
Original-Rcpt-To: e25a7fe465a61a78xxxxxx@example.net
Original-Rcpt-To: e25a7fe465a61a78xxxxxx@example.net
Original-Mail-From: casper@example.com
Reported-Domain: example.com
Source-Ip: 10.0.0.1
Source: Rackspace
Abuse-Type: complaint
Subscription-Link: https://fbl.returnpath.net/manage/subscriptions/xxxxxx
User-Agent: ReturnPathFBL/2.0
$ perl -MSisimai -e 'print Sisimai->dump("1.eml");' | jq .
[
  {
    "replycode": "",
    "recipient": "e25a7fe465a61a78xxxxxx@example.net",
    "subject": "REDACTED",
    "origin": "1.eml",
    "rhost": "",
    "addresser": "casper@example.com",
    "messageid": "3744f1f6REDTACTED2@example.com",
    "feedbacktype": "",
    "diagnostictype": "",
    "deliverystatus": "",
    "timezoneoffset": "+0000",
    "listid": "",
    "action": "",
    "smtpcommand": "",
    "senderdomain": "example.com",
    "softbounce": -1,
    "lhost": "",
    "smtpagent": "Feedback-Loop",
    "catch": null,
    "token": "REDTACTED",
    "destination": "example.net",
    "alias": "",
    "diagnosticcode": "",
    "timestamp": 1602697652,
    "reason": "feedback"
  }

I attempted to add some further options in ARF.pm (eg camelcasing the Source-Ip part, as that seems wrong from them, and extracting the IP from the domain text), however on a clean make it didnt take effect, so likely i did something wrong.

azumakuniyuki commented 3 years ago

@cucx Thanks for the report. We'll inspect and try to fix this issue within a few days :-)

azumakuniyuki commented 3 years ago

@cucx Sorry for the late response. I've read the issue and the sample email text, then I found the better solution for this issue. To get the value of the Source-Ip field is using a callback feature described at https://libsisimai.org/en/usage/#callback

By the way, the value of rhost is only used for calling a module in Sisimai::Rhost class.

Best regards,

genericcx commented 3 years ago

@azumakuniyuki Thanks! Although would this mean i would need to write a seperate parser? I would think that i would be able to simply change this https://github.com/sisimai/p5-sisimai/blob/f12f0e8ef1dc7159d7cbc1648695825986586f93/lib/Sisimai/ARF.pm#L234 to allow camel casing , and then rebuild (as then this would pick up both the correctly cased FBL's and these) . However if I edit that file and make-clean, make-local the changes do not seem to take effect. Should I be doing something else?

azumakuniyuki commented 3 years ago

@cucx I'm so sorry. Code for getting the value of Source-IP: field is already implemented at Sisimai::ARF. The following diff will resolve the issue, perhaps :-)

diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..cf8a2600 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -231,7 +231,7 @@ sub make {
                 # Reporting-MTA: dns; mx.example.jp
                 $commondata->{'rhost'} = $1;

-            } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
                 # The header is optional and MUST NOT appear more than once.
                 # Source-IP: 192.0.2.45
                 $arfheaders->{'rhost'} = $1;
genericcx commented 3 years ago

@azumakuniyuki thanks! although when i do that it still doesnt seem to pick it up.

edit the file:

$ cat p5-sisimai/lib/Sisimai/ARF.pm | grep "\[Pp\]"
            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {

build:

Configuring Sisimai-v4.25.9 ... OK
Building and testing Sisimai-v4.25.9 ... OK
Successfully installed Sisimai-v4.25.9
1 distribution installed

dump

$ perl -MSisimai -e 'print Sisimai->dump("/home/example/Maildir/new/1604169178.H599200P12611.example.com");' | jq 
[
  {
    "timezoneoffset": "+0000",
    "subject": "Message from website",
    "reason": "feedback",
    "diagnostictype": "",
    "senderdomain": "example.com",
    "softbounce": -1,
    "token": "3dcc4a8580d837297c93d3ce3b0045d575ecd81c",
    "catch": null,
    "listid": "",
    "alias": "",
    "deliverystatus": "",
    "smtpcommand": "",
    "destination": "example.com",
    "rhost": "",
    "lhost": "",
    "recipient": "xxx@example.com",
    "messageid": "xxx@example.com",
    "diagnosticcode": "",
    "feedbacktype": "",
    "origin": "/home/example/Maildir/new/1604169178.H599200P12611.example.com",
    "action": "",
    "replycode": "",
    "addresser": "no-reply@example.com",
    "timestamp": 1604172773,
    "smtpagent": "Feedback-Loop"
  }
]

example

$ cat /home/example/Maildir/new/1604169178.H599200P12611.example.com | grep -A2 -B2 "Source-Ip:"
Content-Type: message/feedback-report

Source-Ip: 1.2.3.4
User-Agent: ReturnPathFBL/2.0
Original-Rcpt-To: xxxx@example.com

Am i missing a step here? I can also send u a copy of one if their FBL's if needed, just let me know .

Thanks!

azumakuniyuki commented 3 years ago

@cucx Would you post the entire ARF email (including all headers) as a sample to this issue? We'll try to parse the email with the fixed code.

Best regards,

genericcx commented 3 years ago

Added! I had to redact a lot, but it should be fine, redactedfbl.txt

azumakuniyuki commented 3 years ago

@cucx Thanks for the quickly response :-) We will try to fix/implement code to resolve this issue.

azumakuniyuki commented 3 years ago

@cucx The following diff will resolve the issue, perhaps.

diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..55adb218 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -57,7 +57,7 @@ sub make {
     state $startingof = { 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'] };
     state $markingsof = {
         'message' => qr{\A(?>
-             [Tt]his[ ]is[ ]a[ ][^ ]+[ ]email[ ]abuse[ ]report
+             [Tt]his[ ]is[ ]a[ ][^ ]+[ ](?:email[ ])?[Aa]buse[ ][Rr]eport
             |[Tt]his[ ]is[ ]an[ ]email[ ]abuse[ ]report
             |[Tt]his[ ]is[ ](?:
                  a[ ][^ ]+[ ]authentication[ -]failure[ ]report
@@ -231,7 +231,7 @@ sub make {
                 # Reporting-MTA: dns; mx.example.jp
                 $commondata->{'rhost'} = $1;

-            } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
                 # The header is optional and MUST NOT appear more than once.
                 # Source-IP: 192.0.2.45
                 $arfheaders->{'rhost'} = $1;

The patch above returns the following result.

[
  {
    "alias": "",
    "reason": "feedback",
    "feedbacktype": "abuse",
    "destination": "example.com",
    "catch": {
      "sender": "",
      "parsedat": "2020-11-05 20:38:39",
      "queue-id": "",
      "x-mailer": "",
      "mailsize": 2471
    },
    "softbounce": -1,
    "messageid": "",
    "action": "",
    "addresser": "alice@example.com",
    "smtpcommand": "",
    "rhost": "10.0.0.1",
    "lhost": "",
    "smtpagent": "Feedback-Loop",
    "deliverystatus": "",
    "timestamp": 1604199777,
    "diagnostictype": "",
    "replycode": "",
    "listid": "",
    "diagnosticcode": "",
    "recipient": "hashed@example.com",
    "token": "6050a32a445e642594a0931751dc0822d5583597",
    "origin": "issue-415.eml",
    "subject": "",
    "timezoneoffset": "+0000",
    "senderdomain": "example.com"
  }
]
genericcx commented 3 years ago

Yes perfect thanks! Works for all their reports

Thanks again!

azumakuniyuki commented 3 years ago

@cucx Thanks :-) By the way, would you permit me to add redactedfbl.txt into the repository as arf-25.eml? We want to use the file for make test at the branch (will be merged into master).

Best regards,

genericcx commented 3 years ago

sure, please do redact any thing extra that you think would need redacting. Small note, its not only "RackSpace" , returnpath has many 3rd party providers using their "Universal Feedback Loop", so the Source: will not always show rackspace. https://help.returnpath.com/hc/en-us/articles/220221448-List-of-all-available-complaint-feedback-loops-FBLs-

eg

Source: BAE Systems
Source: Comcast
Source: Fastmail
Source: Italia Online (Libero and Virgilio)
Source: La Poste
Source: Liberty Global (Chello, UPC, Unity Media)
Source: Locaweb
Source: Mail.Ru
Source: OpenSRS
azumakuniyuki commented 3 years ago

@cucx Thank you for your consent :-)

And then, thanks for the information about other email service provider's Source:. We'll find other patterns like This is a Rackspace Abuse Report for an email message received from domain line tomorrow.

 % grep -h 'This is a' set-of-emails/maildir/bsd/arf-* | grep -v MIME
This is an email abuse report for an email message with the message-id of X-000000000000000000000000000000000@YZ received from IP address 192.0.2.89 on Thu, 29 Apr 2009 00:00:00 -0000 (GMT)
This is an email abuse report for an email message received from mx8.example.com  on Thu, 29 Apr 2013 23:45:00 PST
This is an email abuse report for an email message received from IP 192.0.2.2 on Thu, 9 Apr 2006 23:34:45 JST.
This is an opt-out report for an email message received from IP
This is an email abuse report for an email message from amazonses.com on Thu, 29 Apr 2017 23:34:45 +0000
This is a Example email abuse report for an email message received from IP 192.0.2.222 on Thu, 29 Apr 2015 23:34:45 +0000
This is a Example email abuse report for an email message received from IP 192.0.2.1 on Thu, 29 Apr 2015 23:34:45 +0000
This is an email abuse report for an email message received from IP 192.0.2.222 on Thu, 29 Apr 2015 23:34:45 +0000.
This is a spf/dkim authentication-failure report for an email message received from IP 192.0.2.127 on Thu, 29 Apr 2015 23:34:45 +0900.
This is an authentication failure report for an email message received from IP
This is a Example email abuse report for an email message received from IP 198.51.100.224 on Thu, 29 Apr 2015 23:34:45 +0000

% grep '^Source:' set-of-emails/maildir/bsd/arf-*
set-of-emails/maildir/bsd/arf-25.eml:Source: Rackspace
%
azumakuniyuki commented 3 years ago

@cucx The following diff will be able to parse an ARF message other patterns:

diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..4b74b3c0 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -8,8 +8,7 @@ use Sisimai::RFC5322;
 sub description { return 'Abuse Feedback Reporting Format' }
 sub is_arf {
     # Email is a Feedback-Loop message or not
-    # @param    [Hash] heads    Email header including "Content-Type", "From",
-    #                           and "Subject" field
+    # @param    [Hash] heads    Email header including "Content-Type", "From" and "Subject" field
     # @return   [Integer]       1: Feedback Loop
     #                           0: is not Feedback loop
     my $class = shift;
@@ -53,11 +52,14 @@ sub make {
     #
     # Netease DMARC uses:    This is a spf/dkim authentication-failure report for an email message received from IP
     # OpenDMARC 1.3.0 uses:  This is an authentication failure report for an email message received from IP
-    # Abusix ARF uses        this is an autogenerated email abuse complaint regarding your network.
-    state $startingof = { 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'] };
+    # Abusix ARF uses:       this is an autogenerated email abuse complaint regarding your network.
+    state $startingof = {
+        'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'],
+        'report' => ['Content-Type: message/feedback-report'],
+    };
     state $markingsof = {
         'message' => qr{\A(?>
-             [Tt]his[ ]is[ ]a[ ][^ ]+[ ]email[ ]abuse[ ]report
+             [Tt]his[ ]is[ ]a[ ][^ ]+[ ](?:email[ ])?[Aa]buse[ ][Rr]eport
             |[Tt]his[ ]is[ ]an[ ]email[ ]abuse[ ]report
             |[Tt]his[ ]is[ ](?:
                  a[ ][^ ]+[ ]authentication[ -]failure[ ]report
@@ -114,12 +116,16 @@ sub make {
     #
     for my $e ( split("\n", $$mbody) ) {
         # Read each line between the start of the message and the start of rfc822 part.
+
+        # This is an email abuse report for an email message with the
+        #   message-id of 0000-000000000000000000000000000000000@mx
+        #   received from IP address 192.0.2.1 on
+        #   Thu, 29 Apr 2010 00:00:00 +0900 (JST)
+        $commondata->{'diagnosis'} ||= $e if $e =~ $markingsof->{'message'};
+
         unless( $readcursor ) {
             # Beginning of the bounce message or message/delivery-status part
-            if( $e =~ $markingsof->{'message'} ) {
-                $readcursor |= $indicators->{'deliverystatus'};
-                next;
-            }
+            $readcursor |= $indicators->{'deliverystatus'} if index($e, $startingof->{'report'}->[0]) == 0;
         }

         unless( $readcursor & $indicators->{'message-rfc822'} ) {
@@ -137,6 +143,7 @@ sub make {
                 # Microsoft ARF: original recipient.
                 $dscontents->[-1]->{'recipient'} = Sisimai::Address->s3s4($1);
                 $recipients++;
+
                 # The "X-HmXmrOriginalRecipient" header appears only once so
                 # we take this opportunity to hard-code ARF headers missing in
                 # Microsoft's implementation.
@@ -174,7 +181,7 @@ sub make {
                 $rcptintext  = $rhs if $lhs eq 'to';
             }
         } else {
-            # message/delivery-status part
+            # message/feedback-report part
             next unless $readcursor & $indicators->{'deliverystatus'};
             next unless length $e;

@@ -231,7 +238,7 @@ sub make {
                 # Reporting-MTA: dns; mx.example.jp
                 $commondata->{'rhost'} = $1;

-            } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+            } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
                 # The header is optional and MUST NOT appear more than once.
                 # Source-IP: 192.0.2.45
                 $arfheaders->{'rhost'} = $1;
@@ -240,13 +247,6 @@ sub make {
                 # the header is optional and MUST NOT appear more than once.
                 # Original-Mail-From: <somespammer@example.net>
                 $commondata->{'from'} ||= Sisimai::Address->s3s4($1);
-
-            } elsif( $e =~ $markingsof->{'message'} ) {
-                # This is an email abuse report for an email message with the
-                #   message-id of 0000-000000000000000000000000000000000@mx
-                #   received from IP address 192.0.2.1 on
-                #   Thu, 29 Apr 2010 00:00:00 +0900 (JST)
-                $commondata->{'diagnosis'} = $e;
             }
         } # End of if: rfc822
     }
@@ -292,6 +292,7 @@ sub make {

         $e->{'softbounce'}  = -1;
         $e->{'diagnosis'} ||= $commondata->{'diagnosis'};
+        $e->{'diagnosis'}   = Sisimai::String->sweep($e->{'diagnosis'});
         $e->{'date'}      ||= $mhead->{'date'};
         $e->{'reason'}  = 'feedback';
         $e->{'command'} = '';