liuch / dmarc-srg

A php parser, viewer and summary report generator for incoming DMARC reports.
GNU General Public License v3.0
213 stars 31 forks source link

XML parsing fails for google reports #128

Closed youradds closed 2 months ago

youradds commented 2 months ago

I've just found this tool - very cool! I'm having trouble getting it to parse the reports though. Google sends as a ZIP, with the XML in. This is the contents

<?xml version="1.0" encoding="UTF-8" ?>
<feedback>
  <report_metadata>
    <org_name>google.com</org_name>
    <email>noreply-dmarc-support@google.com</email>
    <extra_contact_info>https://support.google.com/a/answer/2466580</extra_contact_info>
    <report_id>7277424156930094083</report_id>
    <date_range>
      <begin>1715731200</begin>
      <end>1715817599</end>
    </date_range>
  </report_metadata>
  <policy_published>
    <domain>example.org</domain>
    <adkim>r</adkim>
    <aspf>r</aspf>
    <p>none</p>
    <sp>none</sp>
    <pct>100</pct>
    <np>none</np>
  </policy_published>
  <record>
    <row>
      <source_ip>192.168.0.1</source_ip>
      <count>4</count>
      <policy_evaluated>
        <disposition>none</disposition>
        <dkim>pass</dkim>
        <spf>pass</spf>
      </policy_evaluated>
    </row>
    <identifiers>
      <header_from>example.org</header_from>
    </identifiers>
    <auth_results>
      <dkim>
        <domain>example.org</domain>
        <result>pass</result>
        <selector>mail</selector>
      </dkim>
      <spf>
        <domain>example.org</domain>
        <result>pass</result>
      </spf>
    </auth_results>
  </record>
</feedback>

Yet when running the script. it doesn't seem to like it:

php utils/fetch_reports.php
Failed to get incoming report:
  Error message:
    - XML error!

Am I missing something?

liuch commented 2 months ago

Hello @youradds , I have just added your report without any problems. Have you installed the required dependencies?

p.s. I edited your issue to remove your domain and ip address from it.

youradds commented 2 months ago

Hello @youradds , I have just added your report without any problems. Have you installed the required dependencies?

p.s. I edited your issue to remove your domain and ip address from it.

Thanks. I'm pretty sure I do. I've run:

sudo apt install php-mbstring php-mysql php-xml php-zip php-json php-imap

Could it be a difference with PHP fpm and cli? IS there any way I can output a bit more debug, so I can see why its not working?

FYI, this is what check_config.php shows:

=== GENERAL INFORMATION ===
  * OS information: Linux 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64
  * PHP version:    8.2.19

=== EXTENSIONS ===
  * pdo_mysql...................... Ok
  * xmlreader...................... Ok
  * zip............................ Ok
  * json........................... Ok

=== CONFIG FILE ===
  * Checking if the file exists.... Ok
  * Checking read permission....... Ok
  * Checking write permission...... Warning
    Message: The configuration file is writable
  * Checking access by other users. Warning
    Message: The configuration file is accessible to other users
  * Checking the output buffer..... Ok

=== DATABASE ===
  * Accessibility check............ Ok
  * Checking for integrity......... Ok

=== MAILBOXES ===
  * Checking mailboxes config...... Ok
    Message: No mailboxes found

=== DIRECTORIES ===
  * Checking directories config.... Ok
    Message: 1 directory found
  * Checking directories (1)
    - DKIM-Reports
      * Accessibility.............. Ok
      * Security................... Ok

=== REMOTE FILESYSTEMS ===
  * Getting configuration.......... Skipped
    Message: Configuration not found

=== REPORT MAILER ===
  * Getting configuration.......... Ok
  * Checking mailer/method......... Ok
  * Checking mailer/library........ Ok
  * Checking mailer/default........ Ok
  * Checking mailer/from........... Ok

===
There are 2 warnings!
liuch commented 2 months ago

Could it be a difference with PHP fpm and cli?

Hm. Both use the same code. So there should be no difference. Just in case I checked to load otherwise - same result. Could you copy the xml content from here, save it to an xml file and check its processing?

IS there any way I can output a bit more debug, so I can see why its not working?

I'm going to add a more detailed error display soon. I'll give you a link to the commit here.

youradds commented 2 months ago

Interesting. If I add the example as foo.xml into the folder and then run it, it imports. If I download the ZIP from Google, and upload the ZIP - I get:

Failed to get incoming report:
  Error message:
    - Failed to add an incoming report: unknown domain example.com
  Report ID: 7277424156930094083

Do I need to add every single domain I want to process? It seems like the example.org one was added automatically:

image

So I'm a bit confused as to why it didn't create the other one? (obviously its not example.com - but I've just put thathere =))

UPDATE: If I add the domain, then when I run, it works fine (with the ZIP file). But I still get the error with the ZIP file attached to the email (with the XML file inside it)

Cheers

Andy

liuch commented 2 months ago

Either something wrong with the zip file, or the xml file contains some invalid character that gets lost during clipboard transfer. I do not have other ideas yet.

Do I need to add every single domain I want to process?

The first domain is added automatically. All subsequent ones must be added explicitly. However, you can change this behavior in your conf.php (fetcher->allowed_domains).

youradds commented 2 months ago

Maybe what I'll do is write a perl script to extract the attachment from the email, and then pass that into a folder for processing.

The first domain is added automatically. All subsequent ones must be added explicitly. However, you can change this behavior in your conf.php (fetcher->allowed_domains).

Ah ok - is there no way to make all domains add automatically? I'm going to have hundreds of domains going to this address, from multiple servers - so adding them manually / maintaining a list isn't really going to be an option =)

Thanks

Andy

liuch commented 2 months ago

Ah ok - is there no way to make all domains add automatically?

Yes, it is possible. The option mentioned above is a regular expression. All domains that match this expression will be added automatically. See the comments to this option in conf.sample.php

youradds commented 2 months ago

Ah ok - is there no way to make all domains add automatically?

Yes, it is possible. The option mentioned above is a regular expression. All domains that match this expression will be added automatically. See the comments to this option in conf.sample.php

Cool - so something like this should work?

'allowed_domains' => '[a-zA-Z0-9\-\.]+\.(org|com|net|co.uk)$'

liuch commented 2 months ago

https://github.com/liuch/dmarc-srg/commit/fd7054bba7d68d901b1a551e879e0e15a154ecd0

Be sure that the debug mode is active in your conf.php file.

youradds commented 2 months ago

Thanks. I just did that, and this is the output:

failed to get incoming report:
  Error message:
    - Incorrect XML report file

Debug information:
Liuch\DmarcSrg\Exception\RuntimeException: XML error!
Parser code: 4
Parser message: Not well-formed (invalid token)
Line: 1; Column: 1
 in /home/ultranerdsdkim/web/dkim.ultranerds.co.uk/public_html/classes/Report/ReportData.php:57
Stack trace:
#0 /home/ultranerdsdkim/web/dkim.ultranerds.co.uk/public_html/classes/Report/Report.php(46): Liuch\DmarcSrg\Report\ReportData::fromXmlFile()
#1 /home/ultranerdsdkim/web/dkim.ultranerds.co.uk/public_html/classes/Report/ReportFetcher.php(123): Liuch\DmarcSrg\Report\Report::fromXmlFile()
#2 /home/ultranerdsdkim/web/dkim.ultranerds.co.uk/public_html/utils/fetch_reports.php(177): Liuch\DmarcSrg\Report\ReportFetcher->fetch()
#3 {main}
liuch commented 2 months ago

Line: 1; Column: 1

It looks like if the file is not an xml file from its first byte.

Is it possible that you are trying to process an eml file directly from the local directory on the server? Processing eml files directly without using IMAP is currently not supported.

youradds commented 2 months ago

Ahhh that was it! I was doing:

$directories = [ .... ]

and setting the path to the maildir. I thought that was the whole point of it? Or is that only for uploading XML/ZIP reports to be processed? Either way, I've now got it working by configuring $mailboxes :)

Thanks for your help, and this cool tool

Cheers

Andy

liuch commented 2 months ago

Or is that only for uploading XML/ZIP reports to be processed?

This is necessary in case you have retrieved report files from another source. For example, if you do not have access to the mailbox from the server or if your mailbox contains more than just incoming DMARC reports so you decide to retrieve the report files using another tool.

I'm glad it all worked out. Thanks for your attention to my instrument.