tierpod / dmarc-report-converter

Convert dmarc reports from xml to human-readable formats
MIT License
237 stars 25 forks source link

XML syntax error #22

Closed jwnetwerk closed 2 years ago

jwnetwerk commented 2 years ago

DMARC reports from protection.outlook.com are send in .gz format. This extension is skipped while it is not .ZIP Furthermore, If the extracted raw XML file is added as input it result in the following error: [ERROR] files: XML syntax error on line 1: illegal character code U+001F, skip

Can this be fixed? When the XML file is passed to e.g. https://us.dmarcian.com/xml-to-human-converter/ it can be read correctly.

tierpod commented 2 years ago

@jwnetwerk Thank you for your report!

*.gz files are supported and should be read and unpacked successfully https://github.com/tierpod/dmarc-report-converter/blob/1aa14eb36ded4e1395dc9f17b22431ccd3a3b43a/cmd/dmarc-report-converter/convert.go#L20

[ERROR] files: XML syntax error on line 1: illegal character code U+001F, skip

This looks like XML file has incorrect encoding.

Could you please attach such gz or xml file for future investigation?

jwnetwerk commented 2 years ago

Hi, I find out that the problem do not appears when using IMAP. Only when uploading the file directly. I can supply you with some files, can I mail it to you?

tierpod commented 2 years ago

Yes, or you can attach it right here ("Attach files by dragging & dropping")

jwnetwerk commented 2 years ago

To which address can I mail it? Because of the content I'm not happy at attaching it to this public topic.

tierpod commented 2 years ago

I'm investigating files you provided and it's interesting. It looks like all 3 files gzipped twice:

$ cat xs4all.nl*.xml.gz | gunzip > 1.xml
$ file 1.xml
1.xml: gzip compressed data
$ cat 1.xml | gunzip > 2.xml
$ file 2.xml
2.xml: XML 1.0 document, ASCII text

After that 2.xml is converted successfully.

But they were sent from different email providers (I received reports from outlook.com on my own installation and they were converted fine). Are you sure your email server doesn't change these attachments somehow?

jwnetwerk commented 2 years ago

These mails are received directly from the senders without any modification.

tierpod commented 2 years ago

Hi @jwnetwerk . I added some workaround for this case in branch issue#22, can you build this version and check?

jwnetwerk commented 2 years ago

I tried with the latest build. This still gives the following error: 2022/02/08 14:30:02 [ERROR] files: XML syntax error on line 1: illegal character code U+001F in file /tmp/dmarc_files/protection.outlook.com!***!1643500800!1643587200.xml, skip

tierpod commented 2 years ago

Have you built this version from branch issue#22? Please, show the output of:

./dmarc-report-converter -version
jwnetwerk commented 2 years ago

sorry. used version: v0.6-20220203 not familiar with making a build from source

tierpod commented 2 years ago

Ok, I released v0.6.2 and hope it will fix this problem. Please, update your installation and check.

jwnetwerk commented 2 years ago

You are great, It is working now. Thanks!