Open horgh opened 6 years ago
This is a good question. It seems like we should be converting the body to canonical form before encoding.
The real question for me is what to do when given an email in encoded form and someone asks for the content (post-decoding). Do we convert to native line endings? What if base64-encoded body doesn't use CRLF endings, which I'm sure will happen often. Do we still convert?
Ugh, I hate email.
Interesting point about the decoding side of things. We should probably convert to native line endings.
Perhaps we could fix the line endings in cases where they are wrong after decoding. I looked for some examples of emails in the wild and found line endings are handled very haphazardly, so you're definitely right it will happen. One email from a large company had both LF and CRLF line endings in a single base64 encoded body part!
Restricting doing such changes to specific MIME types might make doing this less scary. text/plain
, text/html
, and message/rfc822
perhaps. Actually it seems text/*
's line endings are specified by RFC.
I also wonder how any of this impacts DKIM signing of the body. In particular, for a simple text/plain
body munging the line endings when encoding can invalidate the DKIM signature. We'd have the same issue with munging on decoding with validating a DKIM signature.
The more I think about this the more I think that this probably needs to be an option (like canonicalize => 1
). I think it could default to true though.
I really hate email.
The DKIM Signatures RFC looks like it says the canonicalized version should be signed/verified. Or at least that the raw message as seen by the MTA must be used. This makes sense as from what I gather DKIM is often done at the MTA level rather than MUA. (Somewhat related post I found).
I agree that munging would not be good to do when verifying DKIM signatures. It would be safer to apply it to the raw message & do any transformations afterwards.
An option sounds good though.
RFC 2049 section 4 step 2 says we should transform bodies to CRLF line endings ("canonical form"). I've found that emails built using
Courriel::Builder
'sbuild_email()
withplain_body()
do not have such line endings when passing a body with LF line endings. The base64 body does, but not the raw body that gets base64 encoded.For example, if we give Courriel
plain_body("hi\nthere")
we get something likeencode_base64("hi\nthere", "\r\n")
where I think we should haveencode_base64("hi\r\nthere", "\r\n")
.Presumably this is usually okay. However, I've had a recent issue where a more esoteric MUA was not displaying a plaintext email built this way correctly, and this is one possible, though unlikely, culprit. Unfortunately I've not been able to verify whether this is the cause.
I'm not entirely sure this should be Courriel's responsibility, though I think it might make sense. If not, it could be something to document at least so that callers know to do it.
What do you think?
Thank you!