thoughtbot / griddler-cloudmailin

MIT License
2 stars 7 forks source link

Correctly handle Bcc recipients and attachments #4

Closed dominicsayers closed 8 years ago

dominicsayers commented 9 years ago

This pull request is designed to achieve three things:

  1. To make the CloudMailin adapter make better use of the content that CloudMailin sends today.
  2. To handle the case where CloudMailin is processing an email that has been BCCed to it
  3. To turn the attachments hash into an array

So as not to break Griddler itself, I've chosen to put the BCCed address in the bcc field of the normalized params.

I've minimized the changes I've made as far as possible. Here's a walkthrough of what I've had to change in adapter.rb :

  1. I've built the to array from [:headers][:To] instead of [:envelope][:to] because the envelope contains only the bare address and not the recipient's name. Also, Cloudmailin uses [:envelope][:to] for the BCCed recipient if any.
  2. For similar reasons, I've used [:headers][:From] instead of [:envelope][:from] because the envelope contains only the email address and not the name of the sender.
  3. The bcc field is only added to the normalized params if the CloudMailin recipient was BCCed.
  4. To establish whether the [:envelope][:to] field is a Bcc: or a To: recipient, we have to compare the email address with all the email addresses in [:headers][:To] and [:headers][:Cc]. Unfortunately this means calling Griddler::EmailParser for each of them, an expensive but unavoidable step.
  5. Some email clients generate a Bcc: header, some don't. So we can't rely on the Bcc: header
  6. The attachments arrive as a hash keyed on '0', '1' etc. Griddler is expecting an array of attachments

In the tests, I've modified the service params to reflect more accurately what CloudMailin actually sends. I've also added tests to show my expectations of how we handle the BCC case. Finally I've added equivalents of all the tests from the Sendgrid adapter gem.

  1. The real-world [:envelope][:to] and [:envelope][:from] fields are always just a bare email address.
  2. I've added [:headers][:To] and [:headers][:From] to the service params because these are the fields we now use.
  3. In the tests where the recipient is BCCed, you can see that the [:headers][:To] field does not contain the [:envelope][:to] address
dominicsayers commented 9 years ago

As far as possible I copied the coding style of the existing gem. If I fix all the issues that Hound has raised I will add more churn to the code and make it inconsistent with the existing code. Let me know if you want me to fix the Hound issues.

dominicsayers commented 9 years ago

Bump @calebthompson @gabebw

dominicsayers commented 8 years ago

Bump again @calebthompson @gabebw

scsmith commented 8 years ago

@dominicsayers I'll run through this as soon as I can and get things merged. I just want to make sure that everything is constant with the other adapters etc.

dominicsayers commented 8 years ago

@scsmith Glad you've taken over the project. I think creating a .hound.yml file should be a priority :-)

Happy to help out with anything.

dominicsayers commented 8 years ago

@scsmith Can I do anything to help?

dominicsayers commented 8 years ago

Question from Steve Smith:

The change you’re proposing is a little bit backwards incompatible. Using the header value makes sense, however it won’t always match up with the To address.

The SMTP transaction will contain what was actually sent to the SMTP server. The header will possibly contain multiple users and include those that aren’t the intended recipient on the server.

I’m a little bit unsure if we should be forcing users down one route because of the desire to fetch the BCC address.

And my reply:

I think the reason for the change was that envelope[:to] only has the mailbox of Cloudmailin’s copy of the email, not the full address of all the intended recipients. The existing spec was misleading as the default_params defined here https://github.com/thoughtbot/griddler-cloudmailin/blob/master/spec/griddler/cloudmailin/adapter_spec.rb#L39 don’t reflect real-world data.

In my version the default_params reflect what I actually saw in the wild: https://github.com/Xenapto/griddler-cloudmailin/blob/master/spec/griddler/cloudmailin/adapter_spec.rb#L109

The envelope cannot contain the recipient’s name as the original data has it. To do so would contravene RFC 5321, where RCPT TO: takes a Forward-path in the form

“<” Mailbox “>” 

and Mailbox is simply

Local-part "@" ( Domain / address-literal )

as defined here: https://tools.ietf.org/html/rfc5321#section-4.1.2

In fact in all the inspection I did of data from Cloudmailin, I only ever saw the single recipient of that message in envelope[:to] (this might depend on the SMTP provider I was using – Gmail). So I think to achieve the expectations in the specs imported from the Sendgrid adapter we have to use the header fields instead (multiple recipients, full RFC 5322 address format). And it follows that the only way of identifying a BCC recipient is by comparing the envelope[:to] field with the ostensible recipients in the headers.

There may be other users who are seeing multiple values for envelope[:to] and non-RFC-compliant formats for the Mailbox. But if we leave it as it is then it remains more broken than if we make this change – people using SMTP providers like mine won’t see all the recipients of the email you handled for them.

@scsmith: Can you do a broad analysis from your large pool of data of what different providers do with these fields? I imagine there’s some disparity and the Robustness Principle is allowing everything to muddle along just fine despite the differences.

If we’re going to change the behaviour then upping the major version number might warn people that it has changed. And I can certainly write some notes for the README to explain what’s happening.

dominicsayers commented 8 years ago

Bump @scsmith @remo-sk

dominicsayers commented 8 years ago

I've added headers and html to the normalized params

dominicsayers commented 8 years ago

I'm going to close this, rebase and open a new PR