msimerson / mail-dmarc

Mail::DMARC, a complete DMARC implementation in Perl
Other
33 stars 23 forks source link

create Aggregate reports on hour-aligned or day-aligned slices #55

Closed tomkicamp closed 9 years ago

tomkicamp commented 9 years ago

The aggregate reports should be generated on a stable timeframe, somewhat irrespective of exactly when the generation job itself is run. The consequence of not doing this is that the reported window of time slides, depending on how long it takes any prior jobs to run. Here is a view of that occurring in reports I received for 2015-03-09: (I've anonymized the real provider into 'somebody.com')

 DMARC provider |  data start time    |   data end time     |  received by us at
----------------+---------------------+---------------------+---------------------
 somebody.com   | 2015-03-09 00:00:10 | 2015-03-10 00:00:10 | 2015-03-10 00:20:15
 somebody.com   | 2015-03-09 00:01:44 | 2015-03-10 00:01:44 | 2015-03-10 00:20:16
 somebody.com   | 2015-03-09 00:02:39 | 2015-03-10 00:02:39 | 2015-03-10 00:20:16
 somebody.com   | 2015-03-09 00:04:46 | 2015-03-10 00:04:46 | 2015-03-10 00:20:18
 somebody.com   | 2015-03-09 00:08:18 | 2015-03-10 00:08:18 | 2015-03-10 00:20:19
 somebody.com   | 2015-03-09 00:10:02 | 2015-03-10 00:10:02 | 2015-03-10 00:20:20
 somebody.com   | 2015-03-09 00:12:46 | 2015-03-10 00:12:46 | 2015-03-10 00:20:21
 somebody.com   | 2015-03-09 00:14:33 | 2015-03-10 00:14:33 | 2015-03-10 00:20:21
 somebody.com   | 2015-03-09 00:14:33 | 2015-03-10 00:14:33 | 2015-03-10 00:20:22
 somebody.com   | 2015-03-09 00:21:28 | 2015-03-10 00:21:28 | 2015-03-10 01:20:05
 somebody.com   | 2015-03-09 00:22:48 | 2015-03-10 00:22:48 | 2015-03-10 01:20:06
 somebody.com   | 2015-03-09 00:25:33 | 2015-03-10 00:25:33 | 2015-03-10 01:20:07
 somebody.com   | 2015-03-09 00:32:21 | 2015-03-10 00:32:21 | 2015-03-10 01:20:10
 somebody.com   | 2015-03-09 00:32:29 | 2015-03-10 00:32:29 | 2015-03-10 01:20:10
 somebody.com   | 2015-03-09 00:35:45 | 2015-03-10 00:35:45 | 2015-03-10 01:20:10
 somebody.com   | 2015-03-09 00:36:48 | 2015-03-10 00:36:48 | 2015-03-10 01:21:43
 somebody.com   | 2015-03-09 00:37:09 | 2015-03-10 00:37:09 | 2015-03-10 01:22:54
 somebody.com   | 2015-03-09 00:37:14 | 2015-03-10 00:37:14 | 2015-03-10 01:23:27
 somebody.com   | 2015-03-09 00:40:31 | 2015-03-10 00:40:31 | 2015-03-10 01:23:49
 somebody.com   | 2015-03-09 00:41:09 | 2015-03-10 00:41:09 | 2015-03-10 01:23:50
 somebody.com   | 2015-03-09 00:45:36 | 2015-03-10 00:45:36 | 2015-03-10 01:24:22
 somebody.com   | 2015-03-09 00:45:41 | 2015-03-10 00:45:41 | 2015-03-10 01:24:26
 somebody.com   | 2015-03-09 00:45:52 | 2015-03-10 00:45:52 | 2015-03-10 01:24:48
 somebody.com   | 2015-03-09 00:45:57 | 2015-03-10 00:45:57 | 2015-03-10 01:25:21

You can see that the period of the reports generated is sliding.

I think that the run timeframe should always be for the entire prior single-hour timeframe, if the run is for a 1-hour slice. (E.G. 00:00 - 00:59:59 even if the generation is run at 00:24:00) Or, if the run timeframe is for 1 day, it should always generate the entire prior UTC day, 00:00 - 23:59:59

marcbradshaw commented 9 years ago

@msimerson @tomkicamp

Making some assumptions here, which I believe are reasonable.

Planning to implement the optional minimum interval of 1 hour, so anything less than that will be increased. Do you see any issues with this?

Does it make sense to impose an upper limit too? Say 1 day maximum interval? Again, do you see any issues?

If the requested interval fits nicely into a day, then adjust the begin/end times to fit into a window aligned to the start of the UTC day.

If the requested interval does not fit nicely into a day then do what we do now, for new reports set the begin to now and the end to now+interval. I don't know if anyone is requesting crazy intervals in the wild, but if they were then we don't want to be setting odd reporting times which may overlap.

Also noticing the report window is 1 second too long.

msimerson commented 9 years ago

There's two relevant paragraphs in the current Feb 2015 draft:

   ri:  Interval requested between aggregate reports (plain-text, 32-bit
      unsigned integer; OPTIONAL; default 86400).  Indicates a request
      to Receivers to generate aggregate reports separated by no more
      than the requested number of seconds.  DMARC implementations MUST
      be able to provide daily reports and SHOULD be able to provide
      hourly reports when requested.  However, anything other than a
      daily report is understood to be accommodated on a best-effort
      basis.

and

Aggregate reports are most useful when they all cover a common time
   period.  By contrast, correlation of these reports from multiple
   generators when they cover incongruent time periods is difficult or
   impossible.  Report generators SHOULD, wherever possible, adhere to
   hour boundaries for the reporting period they are using.  For
   example, starting a per-day report at 00:00; starting per-hour
   reports at 00:00, 01:00, 02:00; et cetera.  Report Generators using a
   24-hour report period are strongly encouraged to begin that period at
   00:00 UTC, regardless of local timezone or time of report production,
   in order to facilitate correlation.

optional minimum interval of 1 hour, so anything less than that will be increased.

So long as the minimum is optional, and off by default, then it's fine by me.

One thing I found early on, back when I was first setting up DMARC and poring over the reports to make sure everything was working the way I wanted, was that I liked having a really low interval like ri=60, so that over the course of a day, I'd get enough reports back that I could adjust and continue quickly. Hourly reports would be quite sufficient for larger sites, but for little guys with low traffic, the wait can be a long one to get enough reports to be useful. Especially b/c some larger providers don't honor report intervals less than 1 day. It costs very little to honor lower intervals.

Does it make sense to impose an upper limit too?

Longer intervals could cost a little DB storage, but if someone wants reports once a week or month, I'd have no qualms with honoring them. But there's plenty of reasons someone else might feel differently and want to impose a max of 1 day / week / month. Having the option is also fine.

I guess my overall philosophy would be, "give the report requesters what they asked for," and if it's roughly an hour, align it on hour boundaries, and if it's roughly a day, align it on UTC boundaries.

marcbradshaw commented 9 years ago

@tomkicamp there will very likely be an overlap in reporting times when this rolls, will that cause any issues?

tomkicamp commented 9 years ago

@marcbradshaw no issues here regarding possible overlaps, thanks!
@msimerson from a report aggregator perspective I don't foresee any problems with small-interval reports, but my opinion is that 1-day should be the maximum, for reasonable reporting/display purposes.