zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
442 stars 56 forks source link

issues in decoding base 64 html body #166

Closed Hazii-197 closed 2 years ago

Hazii-197 commented 3 years ago

Hy!

I am not sure this is the issue in the library or in my code.

I am able to get the HTML body of the email replied to without any issues. But when my HTML body content type is base 64 encoded then I am facing issues. I am not sure why this is happening. I am using SendGrid API for the inbound parser. SendGrid sents me HTML body but now they are sending base 64 encoded HTML content-type-transfer.

I read zbateson documentation and I found that whether your content-transfer-type is base64 encoded or decoded it will give you the HTML content.

Can please anyone help.

Thank You

zbateson commented 3 years ago

Hi @Hazii-197 --

Are you facing an issue with mail-mime-parser? Most of your posting seems to be about SendGrid. Documentation for this library is on the front page readme and on https://mail-mime-parser.org

zbateson commented 3 years ago

Yes, mail-mime-parser will automatically handle encoding (transfer encodings and charsets) when reading parts if that's the question also :)

Hazii-197 commented 3 years ago

Thanks for your reply. I am in discussion with SendGrid customer support and they said they have no control. the email that replied is sent to the webhook URL they didn't make any modifications to the email. :(

ilmiont commented 2 years ago

I'm also seeing this issue.

Automatic Base64 decoding works if there is no Content-Type header present. As soon as you add a Content-Type: text/html; charset=utf-8, getTextContent() starts returning null when Content-Transfer-Encoding: base64 is also present.

zbateson commented 2 years ago

Hi @ilmiont --

Could you provide a more concrete example? We have a test case for a text/plain that's base64 encoded here:

https://raw.githubusercontent.com/zbateson/mail-mime-parser/master/tests/_data/emails/m0003.txt

This passes fine. I think the original issue in this ticket was something else, and what you're describing may be something different too, so if you could provide an example of it happening (maybe rewrite m0003.txt to where it fails?) that would be helpful.

ilmiont commented 2 years ago

Hi @zbateson

Thanks. I agree that m0003.txt parses OK as-is.

If you modify line 7 to Content-Type: text/html, parsing breaks. This actually seems to happen whenever Content-Transfer-Encoding is set - this isn't limited to Base64, as since writing my comment I've seen the same behaviour with quoted-printable messages too. I think the original issue found the same problem as I have though; at least the symptoms are the same: if you have a Base64-encoded, HTML email, you can't get the body content.

Modified email: m0003_1.txt

Package version in composer.lock is 1.3.2.

Repro code

use ZBateson\MailMimeParser\Message;

$message = Message::from(file_get_contents((__DIR__ . "/m0003_1.txt")));
$content = $message -> getTextContent();

// $content is NULL
zbateson commented 2 years ago

Aah, yeah that makes sense sorry.

The correct method you're looking for in that situation is $message->getHtmlContent() (the message could have two alternative parts, one text, one html, just one text or just one html.

ilmiont commented 2 years ago

Hmm yes, but getTextContent() works to extract the plain text part from a text/html email, if there's no Content-Transfer-Encoding header. I'll try to add more examples tomorrow.

zbateson commented 2 years ago

Please reopen if there's an issue.