Closed genericcx closed 3 years ago
@cucx Thanks for the report. We'll inspect and try to fix this issue within a few days :-)
@cucx Sorry for the late response. I've read the issue and the sample email text, then I found the better solution for this issue. To get the value of the Source-Ip
field is using a callback feature described at https://libsisimai.org/en/usage/#callback
By the way, the value of rhost
is only used for calling a module in Sisimai::Rhost
class.
Best regards,
@azumakuniyuki Thanks! Although would this mean i would need to write a seperate parser? I would think that i would be able to simply change this https://github.com/sisimai/p5-sisimai/blob/f12f0e8ef1dc7159d7cbc1648695825986586f93/lib/Sisimai/ARF.pm#L234 to allow camel casing , and then rebuild (as then this would pick up both the correctly cased FBL's and these) . However if I edit that file and make-clean, make-local the changes do not seem to take effect. Should I be doing something else?
@cucx I'm so sorry. Code for getting the value of Source-IP:
field is already implemented at Sisimai::ARF
.
The following diff will resolve the issue, perhaps :-)
diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..cf8a2600 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -231,7 +231,7 @@ sub make {
# Reporting-MTA: dns; mx.example.jp
$commondata->{'rhost'} = $1;
- } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+ } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
# The header is optional and MUST NOT appear more than once.
# Source-IP: 192.0.2.45
$arfheaders->{'rhost'} = $1;
@azumakuniyuki thanks! although when i do that it still doesnt seem to pick it up.
edit the file:
$ cat p5-sisimai/lib/Sisimai/ARF.pm | grep "\[Pp\]"
} elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
build:
Configuring Sisimai-v4.25.9 ... OK
Building and testing Sisimai-v4.25.9 ... OK
Successfully installed Sisimai-v4.25.9
1 distribution installed
dump
$ perl -MSisimai -e 'print Sisimai->dump("/home/example/Maildir/new/1604169178.H599200P12611.example.com");' | jq
[
{
"timezoneoffset": "+0000",
"subject": "Message from website",
"reason": "feedback",
"diagnostictype": "",
"senderdomain": "example.com",
"softbounce": -1,
"token": "3dcc4a8580d837297c93d3ce3b0045d575ecd81c",
"catch": null,
"listid": "",
"alias": "",
"deliverystatus": "",
"smtpcommand": "",
"destination": "example.com",
"rhost": "",
"lhost": "",
"recipient": "xxx@example.com",
"messageid": "xxx@example.com",
"diagnosticcode": "",
"feedbacktype": "",
"origin": "/home/example/Maildir/new/1604169178.H599200P12611.example.com",
"action": "",
"replycode": "",
"addresser": "no-reply@example.com",
"timestamp": 1604172773,
"smtpagent": "Feedback-Loop"
}
]
example
$ cat /home/example/Maildir/new/1604169178.H599200P12611.example.com | grep -A2 -B2 "Source-Ip:"
Content-Type: message/feedback-report
Source-Ip: 1.2.3.4
User-Agent: ReturnPathFBL/2.0
Original-Rcpt-To: xxxx@example.com
Am i missing a step here? I can also send u a copy of one if their FBL's if needed, just let me know .
Thanks!
@cucx Would you post the entire ARF email (including all headers) as a sample to this issue? We'll try to parse the email with the fixed code.
Best regards,
Added! I had to redact a lot, but it should be fine, redactedfbl.txt
@cucx Thanks for the quickly response :-) We will try to fix/implement code to resolve this issue.
@cucx The following diff will resolve the issue, perhaps.
diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..55adb218 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -57,7 +57,7 @@ sub make {
state $startingof = { 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'] };
state $markingsof = {
'message' => qr{\A(?>
- [Tt]his[ ]is[ ]a[ ][^ ]+[ ]email[ ]abuse[ ]report
+ [Tt]his[ ]is[ ]a[ ][^ ]+[ ](?:email[ ])?[Aa]buse[ ][Rr]eport
|[Tt]his[ ]is[ ]an[ ]email[ ]abuse[ ]report
|[Tt]his[ ]is[ ](?:
a[ ][^ ]+[ ]authentication[ -]failure[ ]report
@@ -231,7 +231,7 @@ sub make {
# Reporting-MTA: dns; mx.example.jp
$commondata->{'rhost'} = $1;
- } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+ } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
# The header is optional and MUST NOT appear more than once.
# Source-IP: 192.0.2.45
$arfheaders->{'rhost'} = $1;
The patch above returns the following result.
[
{
"alias": "",
"reason": "feedback",
"feedbacktype": "abuse",
"destination": "example.com",
"catch": {
"sender": "",
"parsedat": "2020-11-05 20:38:39",
"queue-id": "",
"x-mailer": "",
"mailsize": 2471
},
"softbounce": -1,
"messageid": "",
"action": "",
"addresser": "alice@example.com",
"smtpcommand": "",
"rhost": "10.0.0.1",
"lhost": "",
"smtpagent": "Feedback-Loop",
"deliverystatus": "",
"timestamp": 1604199777,
"diagnostictype": "",
"replycode": "",
"listid": "",
"diagnosticcode": "",
"recipient": "hashed@example.com",
"token": "6050a32a445e642594a0931751dc0822d5583597",
"origin": "issue-415.eml",
"subject": "",
"timezoneoffset": "+0000",
"senderdomain": "example.com"
}
]
Yes perfect thanks! Works for all their reports
Thanks again!
@cucx Thanks :-)
By the way, would you permit me to add redactedfbl.txt
into the repository as arf-25.eml
? We want to use the file for make test
at the branch (will be merged into master
).
Best regards,
sure, please do redact any thing extra that you think would need redacting. Small note, its not only "RackSpace" , returnpath has many 3rd party providers using their "Universal Feedback Loop", so the Source:
will not always show rackspace. https://help.returnpath.com/hc/en-us/articles/220221448-List-of-all-available-complaint-feedback-loops-FBLs-
eg
Source: BAE Systems
Source: Comcast
Source: Fastmail
Source: Italia Online (Libero and Virgilio)
Source: La Poste
Source: Liberty Global (Chello, UPC, Unity Media)
Source: Locaweb
Source: Mail.Ru
Source: OpenSRS
@cucx Thank you for your consent :-)
And then, thanks for the information about other email service provider's Source:
. We'll find other patterns like This is a Rackspace Abuse Report for an email message received from domain
line tomorrow.
% grep -h 'This is a' set-of-emails/maildir/bsd/arf-* | grep -v MIME
This is an email abuse report for an email message with the message-id of X-000000000000000000000000000000000@YZ received from IP address 192.0.2.89 on Thu, 29 Apr 2009 00:00:00 -0000 (GMT)
This is an email abuse report for an email message received from mx8.example.com on Thu, 29 Apr 2013 23:45:00 PST
This is an email abuse report for an email message received from IP 192.0.2.2 on Thu, 9 Apr 2006 23:34:45 JST.
This is an opt-out report for an email message received from IP
This is an email abuse report for an email message from amazonses.com on Thu, 29 Apr 2017 23:34:45 +0000
This is a Example email abuse report for an email message received from IP 192.0.2.222 on Thu, 29 Apr 2015 23:34:45 +0000
This is a Example email abuse report for an email message received from IP 192.0.2.1 on Thu, 29 Apr 2015 23:34:45 +0000
This is an email abuse report for an email message received from IP 192.0.2.222 on Thu, 29 Apr 2015 23:34:45 +0000.
This is a spf/dkim authentication-failure report for an email message received from IP 192.0.2.127 on Thu, 29 Apr 2015 23:34:45 +0900.
This is an authentication failure report for an email message received from IP
This is a Example email abuse report for an email message received from IP 198.51.100.224 on Thu, 29 Apr 2015 23:34:45 +0000
% grep '^Source:' set-of-emails/maildir/bsd/arf-*
set-of-emails/maildir/bsd/arf-25.eml:Source: Rackspace
%
@cucx The following diff will be able to parse an ARF message other patterns:
diff --git a/lib/Sisimai/ARF.pm b/lib/Sisimai/ARF.pm
index 6d1a8602..4b74b3c0 100644
--- a/lib/Sisimai/ARF.pm
+++ b/lib/Sisimai/ARF.pm
@@ -8,8 +8,7 @@ use Sisimai::RFC5322;
sub description { return 'Abuse Feedback Reporting Format' }
sub is_arf {
# Email is a Feedback-Loop message or not
- # @param [Hash] heads Email header including "Content-Type", "From",
- # and "Subject" field
+ # @param [Hash] heads Email header including "Content-Type", "From" and "Subject" field
# @return [Integer] 1: Feedback Loop
# 0: is not Feedback loop
my $class = shift;
@@ -53,11 +52,14 @@ sub make {
#
# Netease DMARC uses: This is a spf/dkim authentication-failure report for an email message received from IP
# OpenDMARC 1.3.0 uses: This is an authentication failure report for an email message received from IP
- # Abusix ARF uses this is an autogenerated email abuse complaint regarding your network.
- state $startingof = { 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'] };
+ # Abusix ARF uses: this is an autogenerated email abuse complaint regarding your network.
+ state $startingof = {
+ 'rfc822' => ['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'],
+ 'report' => ['Content-Type: message/feedback-report'],
+ };
state $markingsof = {
'message' => qr{\A(?>
- [Tt]his[ ]is[ ]a[ ][^ ]+[ ]email[ ]abuse[ ]report
+ [Tt]his[ ]is[ ]a[ ][^ ]+[ ](?:email[ ])?[Aa]buse[ ][Rr]eport
|[Tt]his[ ]is[ ]an[ ]email[ ]abuse[ ]report
|[Tt]his[ ]is[ ](?:
a[ ][^ ]+[ ]authentication[ -]failure[ ]report
@@ -114,12 +116,16 @@ sub make {
#
for my $e ( split("\n", $$mbody) ) {
# Read each line between the start of the message and the start of rfc822 part.
+
+ # This is an email abuse report for an email message with the
+ # message-id of 0000-000000000000000000000000000000000@mx
+ # received from IP address 192.0.2.1 on
+ # Thu, 29 Apr 2010 00:00:00 +0900 (JST)
+ $commondata->{'diagnosis'} ||= $e if $e =~ $markingsof->{'message'};
+
unless( $readcursor ) {
# Beginning of the bounce message or message/delivery-status part
- if( $e =~ $markingsof->{'message'} ) {
- $readcursor |= $indicators->{'deliverystatus'};
- next;
- }
+ $readcursor |= $indicators->{'deliverystatus'} if index($e, $startingof->{'report'}->[0]) == 0;
}
unless( $readcursor & $indicators->{'message-rfc822'} ) {
@@ -137,6 +143,7 @@ sub make {
# Microsoft ARF: original recipient.
$dscontents->[-1]->{'recipient'} = Sisimai::Address->s3s4($1);
$recipients++;
+
# The "X-HmXmrOriginalRecipient" header appears only once so
# we take this opportunity to hard-code ARF headers missing in
# Microsoft's implementation.
@@ -174,7 +181,7 @@ sub make {
$rcptintext = $rhs if $lhs eq 'to';
}
} else {
- # message/delivery-status part
+ # message/feedback-report part
next unless $readcursor & $indicators->{'deliverystatus'};
next unless length $e;
@@ -231,7 +238,7 @@ sub make {
# Reporting-MTA: dns; mx.example.jp
$commondata->{'rhost'} = $1;
- } elsif( $e =~ /\ASource-IP:[ ]*(.+)\z/ ) {
+ } elsif( $e =~ /\ASource-I[Pp]:[ ]*(.+)\z/ ) {
# The header is optional and MUST NOT appear more than once.
# Source-IP: 192.0.2.45
$arfheaders->{'rhost'} = $1;
@@ -240,13 +247,6 @@ sub make {
# the header is optional and MUST NOT appear more than once.
# Original-Mail-From: <somespammer@example.net>
$commondata->{'from'} ||= Sisimai::Address->s3s4($1);
-
- } elsif( $e =~ $markingsof->{'message'} ) {
- # This is an email abuse report for an email message with the
- # message-id of 0000-000000000000000000000000000000000@mx
- # received from IP address 192.0.2.1 on
- # Thu, 29 Apr 2010 00:00:00 +0900 (JST)
- $commondata->{'diagnosis'} = $e;
}
} # End of if: rfc822
}
@@ -292,6 +292,7 @@ sub make {
$e->{'softbounce'} = -1;
$e->{'diagnosis'} ||= $commondata->{'diagnosis'};
+ $e->{'diagnosis'} = Sisimai::String->sweep($e->{'diagnosis'});
$e->{'date'} ||= $mhead->{'date'};
$e->{'reason'} = 'feedback';
$e->{'command'} = '';
It seems like the ARF/Feedbackloop parser does not detect all of the Returnpath (https://fbl.returnpath.net/) versions.
Example (redacted)
I attempted to add some further options in
ARF.pm
(eg camelcasing theSource-Ip
part, as that seems wrong from them, and extracting the IP from the domain text), however on a clean make it didnt take effect, so likely i did something wrong.