zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
459 stars 58 forks source link

Null byte character crops mail content #114

Closed stollr closed 4 years ago

stollr commented 4 years ago

Thanks for providing this great library!

In one of my test mails there was one email that contained a null byte within the content. The parser stops reading the stream if it encounters a null byte.

Here's an example:

$email = "From: test@example.com\n"
    ."Message-ID: <74024701791580191348668.JavaMail.javamailuser@localhost>\n"
    ."Subject: Test\n"
    ."MIME-Version: 1.0\n"
    ."Content-Type: text/html; charset=\"ISO-8859-1\"\n"
    ."Content-Transfer-Encoding: quoted-printable\n\n"
    ."this is \0a test mail";
$parser = new MailMimeParser();
$m = $parser->parse($email);
$content = $m->getContent();
// $content === "this is "

This was hard to debug. It would be nice if the parser could handle this caracter correctly.

stollr commented 4 years ago

Now, I realized that the source of is issue is PHP's quoted_printable_decode() function. This function fails to handle null bytes.

I wonder if it makes sense to workaround this bug (not sure if it is really a bug or the desired behaviour) by something like this:

// src/QuotedPrintableStream.php

    private function decodeBlock($block)
    {
        if (substr($block, -1) === '=') {
            $block .= $this->readEncodedChars(2);
        } elseif (substr($block, -2, 1) === '=') {
            $first = substr($block, -1);
            $block = substr($block, 0, -1);
            $block .= $this->readEncodedChars(1, $first);
        }

        $block = str_replace("\0", '', $block);
        return quoted_printable_decode($block);
    }

What do you think?

stollr commented 4 years ago

Looking into the RFC 2045 at §2.7. (about 7bit Data). It defines NULs - null bytes - as not allowed.

So I'll close this issue.

zbateson commented 4 years ago

Thanks for opening, investigating and closing :). Glad it was figured out.

All the best