php-mime-mail-parser / php-mime-mail-parser

A fully tested email parser for PHP 8.0+ (mailparse extension wrapper).
https://mailcare.io
MIT License
903 stars 197 forks source link

base64_decode encounters memory limit #449

Open r-daniele opened 2 weeks ago

r-daniele commented 2 weeks ago

php-mime-mail-parser v. 8.0.3 php 8.2.21

Hi, while trying to parse emails with big attachments, php runs out of memory. I traced one of the issue with base64_decode (line 650 of Parser.php) and, instead of plain return base64_decode($encodedString); I changed the code like this:

$chunkSize = 1024;
$src = tmpfile();
$metaDatas = stream_get_meta_data($src);
$srcFilename = $metaDatas['uri'];

file_put_contents($srcFilename, $encodedString);

$dst = tmpfile();
while (!feof($src)) {
    fwrite($dst, base64_decode(fread($src, $chunkSize)));
}
fclose($src);

$encodedString = stream_get_contents($dst);

fclose($dst);

return $encodedString;  

It seems to work, but then it fails again because of memory limits at line 616:

 while ($written < $len) {

    $write = $len;

    it fails here ---> $data = fread($this->stream, $write);

    fwrite($temp_fp, $this->decodeContentTransfer($data, $encodingType));

    $written += $write;
}

Is there another way more memory optimized to read this big attachments? It seems today nobody cares if they are too big. Thank you!

eXorus commented 1 week ago

Thanks for reaching out! A few questions and suggestions:

r-daniele commented 1 week ago

Hi, sorry, with "big attachments" I meant files over 25 MB. It’s not infrequent to get emails with attachments over 40 MB. I don't really have a number here. As I said, people today don't know nothing and don't care about the size of an email.

That's why the php memory limits on my machine is 512 MB by default, which is a lot. In my experience, raising that number, as you suggested, could sometimes patch the issue, but it's not gonna solve it forever.

I think the chunk-based approach can provide a once-and-for-all solution, but I don't know how I can replicate this approach in the second bit of code. Is it even possible? Could you help? Thank you again!

r-daniele commented 1 week ago

Just an update, I found out the file causing the issue is a 33 MB PDF file.

eXorus commented 1 week ago

Here’s a possible approach, though I haven't tested it:


$chunkSize = 8192; // Chunk size (e.g., 8 KB)

while ($written < $len) {

    // Determine the read size, to avoid exceeding the end

    $remaining = $len - $written;

    $readSize = ($remaining < $chunkSize) ? $remaining : $chunkSize;

    // Read a chunk from the input stream

    $data = fread($this->stream, $readSize);

    // Decode the chunk and write it to the temporary file

    $decodedData = $this->decodeContentTransfer($data, $encodingType);

    fwrite($temp_fp, $decodedData);

    // Increment the amount written

    $written += $readSize;
}
r-daniele commented 9 hours ago

Hi! Thanks again for your suggestion. I tested it and it does work, although I will have to test it on a larger scale. It does indeed help overcoming the memory leak issue, but I tested it using different memory sizes and sometimes the issue then moves to the getPartBodyFromFile function. In particular it could fail on this line:

            $body = fread($this->stream, $end - $start);

I'll test the code further and I'll report back soon. If it does work, would you update the library? Thanks!

eXorus commented 9 hours ago

Sure, I'd be happy to integrate this improvement into the library if it helps reduce the library's memory footprint when handling large emails, without affecting its current functionality. Let me know how your tests go!