liuch / dmarc-srg

A php parser, viewer and summary report generator for incoming DMARC reports.
GNU General Public License v3.0
213 stars 31 forks source link

Feature Request -- Support for retrieving DMARC emails from AWS S3 bucket #107

Open andrewhenke opened 8 months ago

andrewhenke commented 8 months ago

Hello!

I utilize Amazon Web Services for a majority of my clients, and would love to be able to configure my DMARC reporting email address, and those of my clients, to have all inbound DMARC reports automatically stored in an S3 bucket, instead of having to setup a shared inbox through my Microsoft account, etc.

Would it be possible to utilize something such as the open source project FlySystem to support both local file locations, as well as remote and cloud storage locations, for retrieval of stored email messages and attachments?

I'm happy to clarify further, and answer any questions you may have. Thank you!

liuch commented 8 months ago

Hello! You can extract the files using any utility to put them any directory on your server and then run php utils/fetch_reports source=directory. This method is not suitable for you?

andrewhenke commented 8 months ago

The reason that this is not suitable is because the retrieval of a remote cloud storage location's contents just to sync onto the local machine requires custom code to be written and setup as a cron job onto the server itself, and it would then require either a separate web interface to allow the users of your system to trigger the fetch and download of the files directly, instead of being able to do so natively within the system. Further, by utilizing FlySystem, you will still support the native 'local directory' file retrieval method, but you will also be adding support for using remote cloud file storage locations in the same manner as a 'local' directory.

Does that make sense, and can I clarify anything further?

Thank you!

liuch commented 8 months ago

I hear you. I'll check out that project the other day. I'm not sure I have anything to test it on, though.

andrewhenke commented 8 months ago

Thank you for looking into it -- if you would like to work together on this, I would be happy to privately provide a remote storage location (via S3) for you, to use for testing, etc. Just let me know!

williamdes commented 8 months ago

https://min.io/

Is awesome to have a local S3 working storage in minutes

andrewhenke commented 8 months ago

https://min.io/

Is awesome to have a local S3 working storage in minutes

Wouldn't this solution require more than simply entering S3 IAM access credentials? Or am I not seeing what you are referencing?

williamdes commented 8 months ago

https://min.io/ Is awesome to have a local S3 working storage in minutes

Wouldn't this solution require more than simply entering S3 IAM access credentials? Or am I not seeing what you are referencing?

This solution is a drop in replacement for AWS S3

I use this for on a client to emulate our production S3 bucket in a free and portable way.

So you can use it just like AWS S3 Credentials and stuff will work the same

andrewhenke commented 8 months ago

Ahhh

For my use case, I wouldn't find it useful because the DMARC reports are sent to a SES email address, where the inbound emails are then automatically processed and stored into S3.

I'll keep it in mind for future reference, however.

liuch commented 8 months ago

I believe williamdes meant using this tool for testing purposes.

andrewhenke commented 8 months ago

Ahh, my bad, didn't realize that -- that makes a lot more sense

liuch commented 7 months ago

@andrewhenke I've implemented these options for S3: key, secret, token, bucket, path, profile, endpoint, region. I haven't forgotten anything? Which options (names) do you use?

andrewhenke commented 7 months ago

I should be able to tell you most accurately once I take a look at the implementation, but as of now, it looks like that should be everything that is needed! I'm extremely excited to be able to utilize this functionality.

On that note, is there any ability to be able to trigger the fetch/import of the emails from the storage location from within the UX of the web interface? That is the single biggest 'struggle' my team has with using the system on a enterprise level, because the ingestion of new emails requires a technical team member to access the server, instead of non-technical team members being able to trigger the ingestion of data via the web interface.

Thanks!

liuch commented 6 months ago

@andrewhenke I have just added implementation for this. Could you please test this commit on your system? See config/conf.sample.php for details.

Note: By default, successfully processed report files are deleted from the file system. Make sure that you use copies.

andrewhenke commented 6 months ago

Certainly! I will do so in the morning -- I'm very excited to give it a try!

andrewhenke commented 6 months ago

@liuch I wanted to double check with you -- does the code that you released support extracting the file attachment from the full email itself automatically, or do I need to write and utilize a AWS Lambda function that separates the actual DMARC report attachment from the email itself? I would prefer to not need to use Lambda, if this is something that the codebase will support, or already does support. Please let me know if you have any additional questions or need me to clarify further.

liuch commented 6 months ago

I guess I didn't read your first post carefully. I thought the bucket contained report files only (gz, zip, xml), i.e. attachments, not mail messages. My code doesn't work with messages saved as a file.

Could you tell me what is the format of the messages saved to the bucket? Is it *.eml or something else? Maybe I can add processing for such files.

andrewhenke commented 6 months ago

No worries @liuch! By default, AWS stores the complete, raw email in the MIME format, which you can reference here in the AWS documentation, as well as RFC 2045. There are numerous MIME PHP processing libraries that are out there, such as php-mime-mail-parser which I found rather quickly through doing a few searches.

Does this help?

liuch commented 6 months ago

Thank you for the information. I won't promise to add such an implementation anytime soon. But I will definitely consider this possibility.