haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.09k stars 661 forks source link

Damaged encoding when body is non-utf #2176

Closed analogic closed 7 years ago

analogic commented 7 years ago

I've found that some delivered emails are completely unreadable - typicaly non-utf8 plaintext emails. After some time of investigating I've come to this code https://github.com/haraka/Haraka/blob/master/transaction.js#L95-L108

it basicaly does this

            let new_line = line.toString('UTF-8');
            ...
            line = new Buffer(new_line, 'UTF-8');

The catch is that unicode or ascii will pass this transcoding ok, but with other encodings (I am in country where iso-8859-2 or windows-1250 is still used) it will damage output and mail became unreadable.

I would send pull request but I am still missing bigger picture and also bit confused why body parsing alters actual email if plugin doesn't do any changes (attachments.js and queued through qmail plugin)...

example of damaged email, input:

...
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-2; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit

Příliš žluťoučký kůň úpěl ďábelské ódy.

output:

...
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-2; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit

P��li� �lu�ou�k� k�� �p�l ��belsk� �dy.
smfreegard commented 7 years ago

Yeah - this looks wrong. @baudehlo - the punycode stuff you added changed the encoding from 'binary' to 'utf-8' in #1944, but I can't work out why that was necessary - I can understand the need for headers, but for the message body as well?