Error report summary - Githubissues

ShawnPConroy commented 1 month ago

I would like a utility that will generate a summary of errors. When the user runs it, there should be an ignore type, where I can ignore SPF softfails. And the database can keep a record of when I as a user last requested an error report, and only list reports with a loaded_time after the last report.

It should be a single table that shows each domain on it's own row, summarizing reports containing the requested errors in their own column. If there were no such errors, it should not send an email at all.

That is maybe the most robust, straight forward and useful to people who want hourly or daily summaries of errors (if any came in). More straightforward than my initial thoughts on the subject, which I will include below in case you like them better.

Generate a report with error types for domain as all, a list, or by user, where type may be all or a comma separated list of dmarc, dkim, spfsoft, spfhard, as well as the typical period parameter. Ideally, without a period, the default behaviour would be to send all reported errors of the above types since the last time this user ran an error report summary.

The period should be from when the report is requested, and be based on when it was fetched and loaded in to the database. That is, lastndays=1 won't get from yesterday's reporting period, but the last 24 hours of reports by time they were received. That way none are missed, like if a report is delayed in coming in.

One concern I have is missing a delayed report. For example, if I have a cron job sending out just the error summary, and it sends every day, but an error report for the last day's reporting period comes in after I send it, it would not be caught in any reporting period. It would be good to have up-to-the-minute errors, since the last error summary was requested, rather than it being based on the reporting period.

Thanks for this. I made a collection of scripts that did something very similar a couple of years ago, but my approach was fundamentally flawed, and Google's habit of resending the same report more than once is causing strain on it. DMARC-SRG is a very good replacement for it.

liuch commented 1 month ago

To be honest, your suggestion correlates well with my idea that I considered originally, and which I've been putting off for a long time: a separate script to send more frequent special reports in case of receiving DMARC incoming reports with errors. I initially took into account the possibility of storing such data for each user separately, so there will be no problem with that. As for domains, there is a problem. It's not just about the data structure, it can be extended. The problem is that different report dates will cause a large number of SQL queries when generating such a report: at least one SQL query for each domain.

I'll think more about how this can be implemented painlessly.

liuch commented 1 month ago

When the user runs it, there should be an ignore type, where I can ignore SPF softfails.

It is initially assumed that users cannot run utils from the utils directory. Maybe you mean the web interface?

I also believe specifying a period would be redundant. The date of the last report is an excellent solution. Except that you should specify the maximum number of default cutoff days. In order not to strain the database in case of too big pause or first run with a large number of records in the past.

liuch commented 1 month ago

Thank you very much for your attention and kind words to my project.

ShawnPConroy commented 1 month ago

When the user runs it, there should be an ignore type, where I can ignore SPF softfails.

It is initially assumed that users cannot run utils from the utils directory. Maybe you mean the web interface?

By user, I meant the cli user, not a user registered with the script. I would assume such reports would only be run via cron or cli, and emailed out. An error report is no different than filtering incoming reports by disposition.

Edit to add: except, summarized, and automatic, and sent to email.

I also believe specifying a period would be redundant. The date of the last report is an excellent solution. Except that you should specify the maximum number of default cutoff days. In order not to strain the database in case of too big pause or first run with a large number of records in the past.

I agree. A default maximum of 7 days seems like a good default. Maybe 10. And someone with too many records over a week should know better than request that many.

ShawnPConroy commented 1 month ago

As for domains, there is a problem. It's not just about the data structure, it can be extended. The problem is that different report dates will cause a large number of SQL queries when generating such a report: at least one SQL query for each domain.

That's interesting, I hadn't considered it working so differently than a summary report.

I had imagined running a report for one user and all domains visible to the user. No different report dates. It just runs from the last time admin or whatever user ran their report. One query for the user on all reports in the domain list of the user since the last report.

I hadn't considered the script running the report for all users at once. Is that what you were thinking? If so, that still doesn't matter: it will still be the same date for all users, since it will be the last time all users sent the report.

ShawnPConroy commented 1 month ago

Ideally, domains would be sorted by error messages. Something like, disposition reject, then quar, then among the none's, DKIM fails, then SPF fails, then SPF softfails.

liuch commented 4 weeks ago

The problem is here:

Generate a report with error types for domain as all, a list, or by user, where type may be all or a comma separated list of dmarc, dkim, spfsoft, spfhard, as well as the typical period parameter. Ideally, without a period, the default behaviour would be to send all reported errors of the above types since the last time this user ran an error report summary.

Suppose you have created such a report for the domain example1.com. The report is created, the date is saved. Then you decided to get a report for all your domains (example[1-10].com). What date should it use to generate a new report? How to store it? What if they are tags that user had edited shortly before the next report generation? You also mentioned filters by error type...

liuch commented 4 weeks ago

So far I don't have a good idea of how to implement this properly.

ShawnPConroy commented 2 weeks ago

I see what you mean. I say we should change those requirements. I wrote the top section of this issue last, and wanted it to supersede everything else. Yes, there is a conflict there.

Instead, I think we should not accept a domain list. Just by user (or, if tags, then by tag). It shows all errors for a user since they last ran it. In my mind, if you want errors, you want errors from all domains. If you want to segment them off, do it by user, which in the README you suggest for people who want to group domains. (Or by tags.)

So save the last request date for each user (or each tag).

If you want to accept a list of domains anyway, there are two different processes you could use:

If there is a list, they must specify a period.
Or, show all errors since the date of the domain with the longest time since the last report was made for it for all domains. Then update all domains with the date for this report. So if all domains had a report in the last week, but one domain was a month ago, just show all errors for those domains since the last month. Then update them all to the current date.

Would not affect the domains by user date.

liuch / dmarc-srg

Error report summary #142