msimerson / mail-dmarc

Mail::DMARC, a complete DMARC implementation in Perl
Other
33 stars 22 forks source link

Report parsing failure: ":1: parser error : Start tag expected, '<' not found" #247

Closed richlv closed 3 months ago

richlv commented 3 months ago

Describe the bug Reports from gmx.net are not parsed. Cannot share the original report unfortunately, but happy to perform debugging.

Saving the attachment from the email produces valid gzipped XML. base64-decoding the attachment also produces a valid gzipped XML.

To Reproduce Steps to reproduce the behavior:

  1. Get a report from gmx.de
  2. dmarc_receive it

Expected behavior Parsed as others.

Server (please complete the following information):

Perl (please complete the following information):

richlv commented 3 months ago

The string, printed in the error message, might correspond to hex ef bf bd.

The relevant section of the email might be:

Content-Type: application/zip;
 name=gmx.net!example.com!stuff.xml.gz
Content-Disposition: attachment;
 filename=gmx.net!example.com!stuff.xml.gz
Content-Transfer-Encoding: base64

Adding (under-documented?) -v/--verbose gives:

report domain: example.com submitter: gmx.net report-id: <id>
Unknown message part multipart/alternative
handling decompressed body
:1: parser error : Start tag expected, '<' not found
�
^

Printing out $body at this point dumps binary data - presumably, the compressed blob.

Doing an ugly hack here:

        if ( $c_type eq 'application/zip' || $c_type eq 'application/x-zip-compressed' ) {
           $self->get_submitter_from_filename( $filename );
           $unzipper->{zip}->( \$part->body, \$bigger );

->

        if ( $c_type eq 'application/zip' || $c_type eq 'application/x-zip-compressed' ) {
           $self->get_submitter_from_filename( $filename );
           $unzipper->{gz}->( \$part->body, \$bigger );

results in XML being printed. Fails right after that again:

Can't call method "findnodes" on an undefined value at /usr/lib/perl5/site_perl/5.40.0/Mail/DMARC/Report/Receive.pm line 270.
richlv commented 3 months ago

The zip content type apparently was in error and was fixed on the GMX side. Perhaps additional validation could be done - and the verbose flag documented / added in --help?

msimerson commented 3 months ago

There's an infinite number of ways reports can fail to adhere to the DMARC specification. I think trying to guard against them is not a good use of my time, but I would happily consider a PR that implements your suggestions.

richlv commented 3 months ago

Thank you so much for looking into this. What about the undocumented -v flag, could that be handled in this issue, or would a new one be preferred?

msimerson commented 3 months ago

No need for an open issue, a PR is preferred.