trusteddomainproject / OpenDMARC

This is the Trusted Domain Project's impementation of the DMARC protocol libary and mail filter, called OpenDMARC. A "milter" connects to unix-based mailers (originally, sendmail, but now many) and provides a standard filtering API.
Other
98 stars 52 forks source link

aggregate reports - why the complicated pipeline? #250

Open Keeper-of-the-Keys opened 7 months ago

Keeper-of-the-Keys commented 7 months ago

Hey, I hope I am not reopening something that has been discussed ad nauseam already, but I didn't see any discussion in the bug tracker here.

What is the reason that the pipeline for generating aggregate reports is so long? By long I mean:

  1. OpenDMARC writes a HistoryFile
  2. opendmarc-importstats imports said history file into a db
  3. opendmarc-reports generates a report based on the db and send it

Superficially it would seem that OpenDMARC could also write directly to the DB instead of a file, I assume people smarter than me have thought about this a lot and came to the conclusion that the above pipeline is better and I would like to understand those reasons.

The reasons that I could think about are that writing to a file is "easier"/"cheaper in compute" and less prone to lockup/failure than writing to a db and that importstats may be very intensive for larger setups so you may not want to run that on the same machine.

dgeo commented 4 months ago

Here we have 4 different machines, and one to import all files… using a unique DB would add a SPOF, using a DB cluster would add complexity (to an already-not-that-simple setup)… And adding IO/locks per mail seems a bad idea (there are many mails per second sometime…)