gutmensch / docker-dmarc-report

211 stars 23 forks source link

Uncaught ValueError: DOMDocument::loadXML(): Argument #1 ($source) must not be empty #31

Closed keny2021 closed 1 year ago

keny2021 commented 1 year ago

Hi guys, Thanks for the hard work that you have put into this amazing tool.

I have manage to set it up and get the reports generated, I was also able to open the first report however now I'm receiving the error message below as captured in the screenshot. This happens when I try to open any other report except the first one.

Fatal error: Uncaught ValueError: DOMDocument::loadXML(): Argument #1 ($source) must not be empty in /var/www/viewer/dmarcts-report-viewer-report-data.php:213 Stack trace: #0 /var/www/viewer/dmarcts-report-viewer-report-data.php(213): DOMDocument->loadXML() #1 /var/www/viewer/dmarcts-report-viewer-report-data.php(59): formatXML() #2 /var/www/viewer/dmarcts-report-viewer-report-data.php(366): tmpl_reportData() #3 {main} thrown in /var/www/viewer/dmarcts-report-viewer-report-data.php on line 213

dmarc-errorDOM

I have to say that I'm not very familiar with PHP and you guys might spot the issue very quickly. Nevertheless thanks to anyone who can guide me in the right direction.

gutmensch commented 1 year ago

Hey @keny2021 , so it looks like you might have reports in your database, where the raw_xml column is NULL or empty. The report parser should actually fill this column for every report with the received XML data, but this is sometimes zipped or can maybe not extracted from the mail (MIME attachment) and so on and so forth.

For debugging: Can you access the database directly and run something like

SELECT COUNT(*) from dmarcdb.reports WHERE raw_xml IS NULL OR raw_xml = "";

That's more a pseudo statement, you might need to adjust the database name and the query depending on your db flavour and setup. The number should be 0. If it's not zero, then you indeed have reports where the data is empty (then remove the count clause and check the reports manually).

keny2021 commented 1 year ago

Hi @gutmensch Thank you for your answer.

Indeed there are 14 reports showing with empty data.

MariaDB [(none)]> SELECT COUNT(*) from dmarc_report.report WHERE raw_xml IS NULL OR raw_xml = "";
+----------+
| COUNT(*) |
+----------+
|       14 |
+----------+
1 row in set (0.005 sec)

When accessing the mailbox and check the processed reports I do not see any difference from the one that's working and the rest that are not. They are all in .zip format.

From what I can understand a possible reason is the parser being unable to extract the report? What could be the root cause for that?

gutmensch commented 1 year ago

@keny2021 Most welcome and thanks for the nice words from the original post! :-)

Can you maybe correlate the reports with errors from the parser log?

docker exec -ti dmarc-report cat /var/log/nginx/dmarc-reports.log

In theory decompression should just work fine - the parser logic to unzip hasn't really changed AFAIK, but maybe some perl module is missing for a specific compression method or the module needs a bump.

keny2021 commented 1 year ago

Thanks @gutmensch

Found and fixed the issue :-)

In the dmarc-reports.log I was able to identify the following relevant message:

dmarcts-report-parser.pl: google.com: 15702212058847713497: Skipping storage of large XML (114578 bytes) as defined in config file.

Modified the $maxsize_xml option from 50000 to 500000 in the parser config file to allow XML files that are bigger than 50kb. My xml files are > 100kb.

docker exec -ti dmarc-report /bin/bash
vi /usr/bin/dmarcts-report-parser.conf

Stopped the container, moved the emails to be reprocessed then started the container. Now all the reports are displayed correctly.

Thanks again for helping me in figuring this out!

gutmensch commented 1 year ago

Nice finding, @keny2021 and thanks for reporting! We could also turn this into an env variable, so everybody with your problem could just configure it e.g. change

$maxsize_xml = 50000;

to

$maxsize_xml = $ENV{'PARSER_XML_MAXSIZE'} // 50000;

what do you think? I'm just spotting some mistake that I also made in lines 15 and 16, where the double pipe should actually be double forward slash for perl default handling.

Wanna create a PR or should I? :)

keny2021 commented 1 year ago

It's a great idea to create this as an environment variable.

Please create the Pull Request yourself. :)

chrisvanmeer commented 1 year ago

This helped me fix the same issue. Thanks!!
Maybe it's worth mentioning this in the README.md as well.

keny2021 commented 1 year ago

Glad to hear that @chrisvanmeer

gutmensch commented 1 year ago

@chrisvanmeer Feel free to open a PR for README clarification. :) I added the environment variable mapping here but I was indeed too lazy to document it.

chrisvanmeer commented 1 year ago

PR https://github.com/gutmensch/docker-dmarc-report/pull/32 submitted.