reactioncommerce / reaction-feature-requests

Reaction Feature Requests
13 stars 1 forks source link

Re-visit how emails are stored in the Emails collection after successful send #26

Closed aaronjudd closed 6 years ago

aaronjudd commented 6 years ago

@kieckhafer commented on Mon Jan 23 2017

Need to purge or reduce HTML footprint in Emails collection after an email has successfully sent.

After 24 hours? After X days?


@jshimko commented on Mon Jan 23 2017

The email collection was originally created (by me) because the job documents were being cleaned up before you could even get a chance to view them in the UI (which made the email logs UI pointless). So I inserted them into the Emails collection for longer term storage. It looks like I didn't restore the cleanup job for the sendEmail jobs though, so it's currently keeping them in both collections. We obviously don't need that.

So, if we're going to delete email logs after some period of time, I think it probably should be a setting in the UI and maybe even optional altogether (some people may actually want to keep them). Otherwise, people won't know why their email logs are disappearing from the log table in the UI. The publication for the email logs uses a limit param, so it's definitely performant. It only publishes the amount of documents specified in that limit (default: 10). So it's really just the storage space in the db we'd have to consider when saving email logs.

Considering emails can potentially fail to send and that usually requires some human intervention, I think we probably need to be careful about how fast we delete those records. Even a week feels a little quick when you consider that an admin may only check the logs every once in a while. Maybe we need a notification API for failed jobs that sends emails to admins?

This could obviously use a little discussion/brainstorming, so lets talk about it in our technical meeting tomorrow (and here too for those that aren't in the internal meeting).


@jshimko commented on Mon Jan 23 2017

Also FYI, removing the following param from the Job query will allow the sendEmail jobs to be included in the clean up every 3 days with everything else (they'd still exist in the Emails collection)

https://github.com/reactioncommerce/reaction/blob/master/imports/plugins/included/jobcontrol/server/jobs/cleanup.js#L36

Also, note that that cleanup job is a little aggressive with what it deletes. That will permanently delete any job that's 3 days old or older with a status of "cancelled", "completed", or "failed". One potential reason that could be a problem is a job with a status of failed could still be retried (just like in the email logs UI). For example, after an admin fixes whatever made the job fail in the first place, they may want to trigger a retry without having to come up with all of the original job data again. Deleting that job automatically means you've removed that option and they won't be able to retry again without submitting a new job. Sure, most jobs have run through all of their retries or been assessed by day 3, but we still need to be aware of stuff like that when we're potentially cleaning up jobs that other users could have created in their own custom plugins. The current state of things doesn't allow them to prevent that cleanup from happening to their custom jobs. We're currently enforcing it on all jobs.

So maybe we need to somehow tag internal Reaction jobs and only do a cleanup of jobs jobs we create in core code. That'd give users the freedom to do whatever they want with the jobs API within their own plugin and not worry about the app deleting their stuff unexpectedly.


@jshimko commented on Mon Jan 23 2017

Oh, oops. Just remembering...

The Emails collection isn't actually used anywhere. It's strictly for longer storage and some future use cases that haven't been implemented yet (UI, logs, etc.). Nothing in the app accesses it right now though. The email logs UI gets its data right from the jobs collection. The reason for that being the job needs to actually exist to be able to trigger a retry (which you can do from that table).

So yet another variable we need to discuss when it comes to deleting emails.


@spencern commented on Wed Jan 25 2017

That last message about the emails collection seems correct from what we're experiencing.

I don't think there's a serious problem with storing emails long term for the ease of resending order confirmations, etc.

The biggest issue I see with the Emails collection currently is that the entire html content of the email is stored (both in Emails and in Jobs) rather than just storing the variable content and a link to the email template associated. This has resulted in very large collections both in Email and Jobs where the data stored are primarily duplicate template information.

censored-email-logs


@jshimko commented on Wed Jan 25 2017

The copies in the Job collection can be deleted on a cleanup schedule, so that part is an easy fix. I guess the only trade-off that I can think of with not saving html is if the template file changes at some point in the future, variables from before that change point may not be able to generate an email that represents what was actually sent to the user originally. Perhaps that's not actually an issue, but maybe it's worth a brief discussion? Maybe we version the templates somehow?

I don't know. Just trying to think of a way to have a reliable copy of what was actually sent. I may be overthinking it though. The to/from/subject/variables may be good enough data for the majority of use cases.


@jshimko commented on Wed Jan 25 2017

I do agree that it's a rather insane amount of data replication. 99% of an email template is usually static, so that's a ton of duplicate data in the database.


@kieckhafer commented on Wed Jan 25 2017

All non-static data in emails is passed into it via an object, dataForEmail. You could potentially just store this object for each email instead of the full HTML.


@jshimko commented on Wed Jan 25 2017

Yeah, absolutely. That just means that if the HTML template changes in the future, you no longer have a record of the exact email body that was sent to the user.

However, I totally admit that that may be an imagined problem. Perhaps nobody actually cares about that and I should just shut my mouth. :)

@spencern, as a production user, any thoughts on that?


@jshimko commented on Wed Jan 25 2017

Another thing to consider...

Email services like Mailgun store all of that detail, so people that care about saving that stuff still have options if we choose to not store the email body.


@zenweasel commented on Wed Jan 25 2017

In my experience doing Customer Service, it is very handy to say to a customer that "this is the exact email we sent you on this day" especially when said email may contain disclaimers or particular info about their order (special return policy, etc.). Because customers are sometimes jerks and everybody thinks they're a lawyer™.

I think with GMail, etc. people do have an expectation that emails are retrievable forever.


@jshimko commented on Wed Jan 25 2017

Ok, proposed solution that solves all points...

1) The default template HTML files get imported into the database on first startup.
2) All custom templates already get saved in the database.
3) Version the templates in the database whenever they change.

That'd make sure there's a reference to every email body before variables are injected without having to store redundant copies of the rendered email for every email sent.

A little bit of work, but not terrible. Right?


@spencern commented on Wed Jan 25 2017

I do think there's some value in being able to resend emails, so I definitely hear your concern @jshimko. I feel like there may be better options to store email templates such as versioning templates like you suggested though that may get overcomplicated as well.

I'd probably consider the following an edge case, which seems to be the main thing we're trying to solve for by storing email template html.

  1. Send an email to a user.
  2. Days/Weeks/Months go by
  3. Some breaking change is made to corresponding email template (variables/data don't match 1-to-1 with previous template)
  4. Need arises to send or reference email again from the application server.

I agree that having logs of when an email was sent, if it was opened, when it was opened, if/how many links were clicked, etc is very valuable when communicating with the customer, but many SMTP / email providers offer that (postmark does, I think mailgun does as well).

@zenweasel I agree that that information is useful when communicating with customers, and in this era, there is some expectation that all data is stored forever.. That being said, I still think there are probably ways to make it possible to resend an email, (any email, even if the template no longer matches up) without storing the template for every email that is sent.

Potentially that means that if an email gets resent after a breaking change is made to the template that some information is missing or there is extra information (more likely IMHO) that wasn't available before or there are blanks or placeholders, but I don't think that's necessarily a catastrophic scenario.

Worst case scenario, a Cust Service Rep should be able to reference the customers order in the Order database and take a screenshot or write up a one time email to the concerned customer.


@spencern commented on Mon Feb 19 2018

As I mentioned with a comment in #1744

I'm not certain that this is a performance issue at this point, and I'm not sure what the best solution is. I think at the core of this issue is how we're storing emails (that we're storing raw emails at all?). A better issue might be to change the way we permit storing and resending of emails to use the existing template and data that's available (order/user/shop etc.) at the time that the email is resent.

I'd propose that we close this issue and create a new, more specific issue related to changing the way emails are stored and sent if that's desired.