markov2 / perl5-Mail-Message

Processing MIME messages
http://perl.overmeer.net/CPAN
1 stars 1 forks source link

Mail::Message::Body::Multipart fails to preserve CRLF after close-delimiter #16

Closed jbalazerpfpt closed 1 year ago

jbalazerpfpt commented 1 year ago

When Mail::Message::Body::Multipart reads and prints a multipart body, if the body ends with a close-delimiter, a CRLF, and an empty epilogue, the CRLF is not preserved.

A multipart body without a CRLF after the close-delimiter is valid under RFC 2046. But in general, it's good to preserve all aspects of a message, if possible, so that DKIM signatures will continue to match. And more importantly, a multipart body without a CRLF at the end of the close-delimiter is not valid under the obsolete RFC 1521.

Mail::Message 2.x is an example of an RFC 1521-compliant parser that fails to process the message correctly when the CRLF is missing: it treats the close-delimiter as a part delimiter, and imagines the multipart body to have an extra empty part. We discovered this bug by the interaction of Mail::Message 3.x and 2.x. A message was processed by Mail::Message 3.x, which stripped the CRLF. The message was encapsulated, transmitted, and then processed by Mail::Message 2.x. Upon receipt, Mail::Massage 2.x processed the message incorrectly, adding the extra empty part, which in some mail clients will be displayed as the message's body instead of the intended part.

Using SMPT transport, a multipart message will always end with a CRLF, because the data must be terminated with CRLF, a period, and a CRLF. But various means of encapsulating a multipart body can preserve the presence or absence of a CRLF at the end of the body. A multipart body nested within a multipart body, for example, can have the absence of a CRLF at the end of the inner body preserved by the encapsulation.

My fix here preserves the presence or absence of a CRLF after the close-delimiter, and also adds handling of input with linear whitespace transport-padding between the close-delimiter and the CRLF, which is allowed under RFC 2046.

There may be a better way of fixing this bug, but my proposed fix here succeeds in addressing the problem. I have tested it with the attached test files.

+++ lib/Mail/Message/Body/Multipart.pm  2023-05-15 16:40:41.735066000 -0700
@@ -265,8 +265,17 @@
     # Get the parts.

     my @parts;
+    my $epilogue_separator = 0;
     while(my $sep = $parser->readSeparator)
-    {   last if $sep eq "--$boundary--\n";
+    {
+        if ($sep =~ /--${boundary}--[ \t]*\n?/) {
+            # Per RFC 2046, a CRLF after the close-delimiter marks the presence
+            # of an epilogue.  Preserve the epilogue, even if empty, so that the
+            # printed multipart body will also have the CRLF.
+            $epilogue_separator = 1 if ($sep =~ /\n$/);
+            last;
+        }
+

         my $part = Mail::Message::Part->new
          ( @msgopts
@@ -291,8 +300,7 @@
             :                      $begin;
     $self->fileLocation($begin, $end);

-   $epilogue->nrLines
-        or undef $epilogue;
+    undef $epilogue unless ($epilogue_separator);

     $self->{MMBM_epilogue} = $epilogue
         if defined $epilogue;

multipart_close-delimiter_test_no-CRLF.eml.txt multipart_close-delimiter_test_CRLF_empty-epilogue.eml.txt multipart_close-delimiter_test_CRLF_epilogue.eml.txt multipart_close-delimiter_test_CRLF_transport-padding_epilogue.eml.txt

markov2 commented 1 year ago

Change included in 3.013, to be released soon. Thanks for the four ideas for tests: used those.