Signatures failing with multiple addresses in the $to argument to get_signed_headers and relaxed canonicalization

sootsnoot commented 10 years ago

The basic problem is that although not documented in RFC 6376, it appears that all DKIM verifiers I can find add a space after the commas that separate addresses in the To: header. Full details on this are at http://stackoverflow.com/questions/21635903/does-dkim-header-canonicalization-insert-a-space-after-each-comma-in-the-to-hea

louisameline commented 10 years ago

Hello,

thanks for the report. I do not see this happening on Yahoo, GMail, AppMailDev nor Port25 though. Probably it is your MTA which modifies the headers with spaces after you signed them, thus causing the issue. What is your MTA ?

louisameline commented 10 years ago

I see three solutions : put spaces yourself before signing, or do not sign on the To header, or change/update your MTA.

sootsnoot commented 10 years ago

Thanks for your replies! I'm sorry, I started a reply several hours ago but got called away before I could complete it (long before your second reply). But yes, I think you're right, in that some MTA is adding the spaces. I originally thought that the messages were arriving without the spaces, but now that I look more closely I see that the spaces are present in the headers delivered, and in the copy of the original headers present in the body of the diagnostic mail returned by the port25.com verifier. And I agree with your solutions, and clearly adding the spaces before signing is the best choice. My only "problem" with that is that the comma-without-space string is produced by Zend_Mail, which I really want to continue using, and the code responsible for that is not convenient to override through subclassing.

Nevertheless it is my problem, not a problem with your class. In order to fix it with changes in your class rather than in the Zend Framework code, I'd probably wind up changing the $to and $headers parameters to be by-reference. Note that the same problem occurs with Cc: and Reply-to: as they are also syntactically address-lists just like To:. So it would not be enough just to fix the $to string. I guess it would be much simpler just to change the ZF code that constructs the lists, despite my reluctance; we'll see.

Anyway, thank you very much for your nice class, and taking the time to think about this problem and its solution. I guess I now should go to stackoverflow and report my "booboo"; hope I don't get downvoted for questioning the correctness of the rather mature RFC :-)

louisameline commented 10 years ago

Thank you, you are welcome :)

Well you obviously made quite an effort reading RFC documents and investigating the issue before asking, it would be unfair to downvote you. Such detailed bug reports are always welcome even if they end up being bogus. With such a complex subject, it would be very surprising if no one ever came up one day with a real bug in the library.

And as I suggested before, a quick fix for you would be to choose not to sign any of the varying recipient headers (to, reply-to etc) using the 'signature_headers' option of the class. [edited] But it's much better to include the To: field more than any other to avoid forgery. I'm tired, forget this comment ;) [edited]

Good luck !

sootsnoot commented 10 years ago

I wound up adding a little pre-filter to put a space after each comma in values of headers containing address-lists - modifying the $headers and $to value before calling get_signed_headers, as I just now described in the stackoverflow post.

But I thought I'd pass on to you two little things I noticed about the mail_signature class. Not bugs, just observations that might or might not be of interest to you.

I don't believe that the uses of mb_substr and mb_strlen are necessary. All uses I checked involved checking/replacing the ending characters of strings with characters in the ascii subset of UTF-8. All indexing into the string is computed by subtracting a fixed number of characters from the length. And since ascii character codes within a UTF-8 string can never be part of a multibyte character, and the character count being subtracted is always a count of ascii characters, you could simply use substr and strlen instead of the multibyte versions.
While get_signed_headers() is written to match the signature of the PHP mail() function, I'm using Zend_Mail which supports what it calls "transports", one of which is 'sendmail' which invokes the PHP mail() function. The others are 'smtp' and 'file' (writes the email into a .eml file, so you don't need to be on a network). For 'smtp', the values available in the transport (which is where I put the call to do the final header filtering and signing) are a little different than in the 'sendmail' transport. I wound up making the following small change in get_signed_headers, in the else part of the test for \n-injection:

else {
        // To and Subject are not expected to be present in $headers if you use the php mail() function,
        // because it takes care of that itself in parameters for security reasons.  So we construct
        // headers for them just for the purpose of signing.
        // However, if you use Zend_Mail with smtp transport, $to contains recipient addresses without
        // the friendly name part, while the friendly name is present in the To: header.  And if you're doing
        // the signing within a Zend_Mail transport, security filtering has already been applied to the
        // header values (as well as $to and $subject).
        // So to avoid special-casing, we add the constructed To: and Subject: headers to the front
        // of the $headers value instead of the end.  Since _dkim_canonicalize_headers_relaxed() uses the
        // last-encountered header of a particular type, this means that explicit headers in $headers
        // will be used for signing in preference to constructed headers.
        //
        // Prepending the constructed headers instead of appending them also makes it
        // unnecessary to ensure there is a CRLF at the end of $headers before appending
        // constructed ones.
        if(!empty($subject)) $headers = 'Subject: '.$subject."\r\n" . $headers;
        if(!empty($to))      $headers = 'To: '.$to."\r\n" . $headers;

        // get the clean version of headers used for signature

louisameline commented 10 years ago

Hello, I am not sure I fully understand what you mean about the utf8 function. It is not advised to send your e-mails with multibyte characters because of old e-mail software that does not handle them correctly. I believe some MTAs will even make the conversion automatically for you. I personnaly always base64-encode my messages for greater compatibility. However, I recall that it is theoritically possible to send multibyte and I wanted the class to work in that use case. But I may be wrong, let me know what you think.

As for the modification in your second point, I see no issue in implementing it, it saves a few characters. However, I am not sure I agree with the other reasons that made you do it, like having twice a same type of headers or having to care about what Zend's framework does. I'd have to understand more your use case maybe. But again, I see no reason not to do it, so it will be changed soon.

Thank you for your feedback.

sootsnoot commented 10 years ago

Hi again, it is good to hear that you take interest in these minor kinds of points, doubtless a reason why your code is such good quality.

The first point, regarding UTF-8, was not that you shouldn't try to accommodate the possibility that the code would encounter UTF-8 strings; but rather that for the particular processing being performed, it is not necessary to use the multibyte functions as the ordinary str functions are guaranteed to produce the same correct result. The usage of the mb_ functions in the code is similar in form to this first one in _dkim_canonicalize_body_simple():

while (mb_substr($body, mb_strlen($body, 'UTF-8')-4, 4, 'UTF-8') == "\r\n\r\n")
    $mail = mb_substr($body, 0, mb_strlen($body, 'UTF-8')-2, 'UTF-8');

This code computes the number of UTF-8 characters in $body, subtracts 4, and then takes a 4-UTF-8-character substring of $body starting 4-UTF-8-characters from the end, and compares it to another string. In the general case where $body is a UTF-8 string, this is the necessary/correct way to get the last 4 UTF-8 characters to compare to another UTF-8 string. However, the comparison string consists of 4 ascii characters. And it is a guaranteed property of UTF-8 that no ascii character code can appear as a byte within a UTF-8 multibyte character. So it doesn't matter how you process the earlier part of the string, if the last 4 bytes are "\r\n\r\n", they represent those 4 ascii characters, they cannot be part of a UTF-8 multibyte character. And while the assignment statement by itself would warrant multibyte treatment, it is in a context where we know that the last two bytes are ascii characters, so no multibyte processing is needed. The same result is produced more efficiently by:

while (substr($body, strlen($body)-4, 4) == "\r\n\r\n")
    $mail = substr($body, 0, strlen($body)-2');

And actually, in checking the other place where mb_ is used (_dk_canonicalize_simple()) there appears to be a typo kind of bug:

while(mb_substr($body, mb_strlen($mail, 'UTF-8')-4, 4, 'UTF-8') == "\r\n\r\n")
    $mail = mb_substr($mail, 0, mb_strlen($mail, 'UTF-8')-2, 'UTF-8');

Since $body is actually a substring of $mail, the first mb_substr will always produce an empty string. Either it should be changed to $mail, or else the while loop could be moved in front of the assignment to $mail, and all uses of $mail changed to $body. Hmm, I guess the second approach is really what's needed, since the first one would result in there being no final "\r\n" in $mail. What's intended is to remove all "\r\n" from $body before producing $mail with exactly one terminating "\r\n". [Edit: My mistake, changing just the $body typo to $mail would be correct as the loop changes multiple "\r\n" to a single "\r\n"]

For the second point, I agree it's a bit obscure, and my explanation involving Zend Framework doesn't make much sense - it's just that it's what motivated me to make the change: calling the method the same way within the smtp transport as within the transport that uses php mail() produced incorrect results because in that situation the $to parameter is stripped of the "friendly name" part of addresses present in the To: header. So as you say, the better reason is that it simplifies the code (removes a level of "if" nesting as well as an extra string manipulation), and it only produces a different result in a case that the current code considers "unexpected".

You do good work :-)

louisameline commented 10 years ago

Thank you, I'm just trying to do what I can :) Thank you for the detailed explanations, this will be food for thought soon. Right now I am overwhelmed by the work on another project, but I will get back to this subject in a couple weeks and make the modifications. Thanks again for your involvment !

louisameline / php-mail-signature

Signatures failing with multiple addresses in the $to argument to get_signed_headers and relaxed canonicalization #1