houseabsolute / Courriel

High level email parsing and manipulation
https://metacpan.org/release/Courriel/
Other
6 stars 5 forks source link

Lack of CRLF line endings in text/plain prior to base64 encoding #4

Open horgh opened 6 years ago

horgh commented 6 years ago

RFC 2049 section 4 step 2 says we should transform bodies to CRLF line endings ("canonical form"). I've found that emails built using Courriel::Builder's build_email() with plain_body() do not have such line endings when passing a body with LF line endings. The base64 body does, but not the raw body that gets base64 encoded.

For example, if we give Courriel plain_body("hi\nthere") we get something like encode_base64("hi\nthere", "\r\n") where I think we should have encode_base64("hi\r\nthere", "\r\n").

Presumably this is usually okay. However, I've had a recent issue where a more esoteric MUA was not displaying a plaintext email built this way correctly, and this is one possible, though unlikely, culprit. Unfortunately I've not been able to verify whether this is the cause.

I'm not entirely sure this should be Courriel's responsibility, though I think it might make sense. If not, it could be something to document at least so that callers know to do it.

What do you think?

Thank you!

autarch commented 6 years ago

This is a good question. It seems like we should be converting the body to canonical form before encoding.

The real question for me is what to do when given an email in encoded form and someone asks for the content (post-decoding). Do we convert to native line endings? What if base64-encoded body doesn't use CRLF endings, which I'm sure will happen often. Do we still convert?

Ugh, I hate email.

horgh commented 6 years ago

Interesting point about the decoding side of things. We should probably convert to native line endings.

Perhaps we could fix the line endings in cases where they are wrong after decoding. I looked for some examples of emails in the wild and found line endings are handled very haphazardly, so you're definitely right it will happen. One email from a large company had both LF and CRLF line endings in a single base64 encoded body part!

Restricting doing such changes to specific MIME types might make doing this less scary. text/plain, text/html, and message/rfc822 perhaps. Actually it seems text/*'s line endings are specified by RFC.

autarch commented 6 years ago

I also wonder how any of this impacts DKIM signing of the body. In particular, for a simple text/plain body munging the line endings when encoding can invalidate the DKIM signature. We'd have the same issue with munging on decoding with validating a DKIM signature.

The more I think about this the more I think that this probably needs to be an option (like canonicalize => 1). I think it could default to true though.

I really hate email.

horgh commented 6 years ago

The DKIM Signatures RFC looks like it says the canonicalized version should be signed/verified. Or at least that the raw message as seen by the MTA must be used. This makes sense as from what I gather DKIM is often done at the MTA level rather than MUA. (Somewhat related post I found).

I agree that munging would not be good to do when verifying DKIM signatures. It would be safer to apply it to the raw message & do any transformations afterwards.

An option sounds good though.