rnwood / smtp4dev

smtp4dev - the fake smtp email server for development and testing
BSD 3-Clause "New" or "Revised" License
3.02k stars 339 forks source link

[Bug] View tab shows wrongly decoded message for "just send 8 bit" sessions. #170

Closed cheoAlejo closed 4 months ago

cheoAlejo commented 5 years ago

When I send a message with latin characters from a file with encoding iso-8859-1 (Western/Windows1252), the "View" tab doesn't show those characters:

My file:

image

smtp4dev View tab:

image

But the Parts > Source tab does show them correctly:

image

The charset of the message:

image

Tech specs

rnwood commented 4 years ago

Thanks for reporting this issue and include enough details to repro.

jafin commented 3 years ago

Hi @cheoAlejo

First a disclaimer, I am quite green with internationalization and it appears to be a minefield.

Anyhow, trying to research this and came across https://stackoverflow.com/questions/25710599/content-transfer-encoding-7bit-or-8-bit

I may of taken this out of context, but are they suggesting 8bit encoding transfer is actually not legal (not recommended) on the internet?

As of the publication of this document, there are no standardized Internet transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies. Thus there are no circumstances in which the "8bit" or "binary" Content-Transfer-Encoding is actually legal on the Internet.

Is there any way you can set Body transfer encoding to QuotedPintable instead of 8Bit ?
I did up a test scenario, and when I change encoding to QuotedPrintable it is shown correctly.

image

image

Test Setup used

   var smtpClient = new SmtpClient("localhost")
            {
                Port = 25,
            };

            var mailMessage = new MailMessage
            {
                From = new MailAddress("latin.test@mailinator.com"),
                Subject = "Latin test",
                BodyTransferEncoding = TransferEncoding.QuotedPrintable,
                Body =
                    "<span>Homines in indicaverunt nam purus quáestionem sentiri unum. Afflueret contentus diam errore faciam, honoris mucius omnem pélléntésqué reiciendis. Acuti admissum arbitrantur concederetur dediti, ferrentur fugiendus inferiorem peccant ponti quando solam ullius. áb atilii concursio constituamus, définitioném diligenter graeci illam máius operis opinionum pótióne versatur. Alliciat aspernari consoletur disserunt, impendere interiret reliquarum verum. Convállis essent foedus gravida iustioribus, mox notissima perpaulum praeclare probatum, prohiberet sensibus. Condimentum efficeretur iis insipientiam, inutile logikh ne ornare, paulo primis primo pugnare putarent quiddam reperiuntur. \r\nCéramico cónsistat éiusdém licet offendimur, recusandae referendá. Cupiditatés hónesta musicis possent, respondendum sollicitudines. Breviter democrito dolor electram illa, ludicra non occulta pérféréndis principio servare suum tranquillitatem. Consentinis probatus qualisque tollatur veritatis. In inséquitur ortum pertinaces, sentit stoici sum téréntii.</span>",
                BodyEncoding = Encoding.Latin1,
                IsBodyHtml = true,
            };
            mailMessage.To.Add("latin.test@mailinator.com");

            smtpClient.Send(mailMessage);
anuj2nt commented 2 years ago

This issue is preventing us from using this awesome tool for a while now. Is there any temporary workaround for this ?

jafin commented 2 years ago

@anuj2nt are you able to supply a test client that creates the email? PHP,js or c#? Happy to investigate further just after some sample code to produce the email?

ballaballaballa commented 7 months ago

I have this issue as well. Could basically use same test data as @cheoAlejo but with text/html;charset=windows-1252.

@jafin According to the stackoverflow post, it is not legal according to the over 20 years old RFC 1341. But since then 8bit MIME Extension in RFC 6152 has been added which supports non-ASCII characters. So support for 8bit MIME Extension would have been greatly appreciated. I am currently running tests from a COTS product and do not have access to modify the content-transfer-encoding value.

ballaballaballa commented 6 months ago

@rnwood Are there any plans to include this feature in the near future?

rnwood commented 6 months ago

Repro:

using System.Net.Mail;
using System.Net.Mime;
using System.Text;

var smtpClient = new SmtpClient("localhost")
            {
                Port = 25,
            };

            var mailMessage = new MailMessage
            {
                From = new MailAddress("latin.test@mailinator.com"),
                Subject = "Latin test",
                //BodyTransferEncoding = TransferEncoding.,
                Body =
                    "<span>Homines in indicaverunt nam purus quáestionem sentiri unum. Afflueret contentus diam errore faciam, honoris mucius omnem pélléntésqué reiciendis. Acuti admissum arbitrantur concederetur dediti, ferrentur fugiendus inferiorem peccant ponti quando solam ullius. áb atilii concursio constituamus, définitioném diligenter graeci illam máius operis opinionum pótióne versatur. Alliciat aspernari consoletur disserunt, impendere interiret reliquarum verum. Convállis essent foedus gravida iustioribus, mox notissima perpaulum praeclare probatum, prohiberet sensibus. Condimentum efficeretur iis insipientiam, inutile logikh ne ornare, paulo primis primo pugnare putarent quiddam reperiuntur. \r\nCéramico cónsistat éiusdém licet offendimur, recusandae referendá. Cupiditatés hónesta musicis possent, respondendum sollicitudines. Breviter democrito dolor electram illa, ludicra non occulta pérféréndis principio servare suum tranquillitatem. Consentinis probatus qualisque tollatur veritatis. In inséquitur ortum pertinaces, sentit stoici sum téréntii.</span>",
                BodyEncoding = Encoding.Latin1,
                BodyTransferEncoding = TransferEncoding.EightBit,
                IsBodyHtml = true,
            };
            mailMessage.To.Add("latin.test@mailinator.com");

            smtpClient.Send(mailMessage);

Unfortunately I fear this is a very complex issue that lies in Rnwood.Smtp4dev server. SMTP4DEV does support 8BITMIME extension and UTF8 but the client is not using it in this case. What is happening is SmtpClient is doing "just-send-8bit" and is encoding the body using whatever encoding is specified. Unfortunately I don't think we're handling that correctly and the body ends up re-encoded as UTF-8. Then when the UI later displays the message, it is reading the Mime-Type header from the body and decoding it using that encoding., but actually it's not encoded like that any more. This is what causes the broken characters.

In the linked PR, I have applied a workaround whilst I think about how to resolve this. The main issue I have is that I'm not sure what is the correct (or most common) behaviour that should be implemented for the "just send 8" case is - What encoding should be used for the body or should it just be treated as a stream of bytes? (this would be a major change to resolve).

The workaround simply avoids this re-encoding in the UI and treats it at UTF8 - which it is. This is not the correct fix though. A build should be generated which you can use to confirm but please note that I don't intend to merge this.

rnwood commented 6 months ago

Further advice if you are seeing this bug. Try make sure your client is using the 8BITMIME extension, which forces UTF8. This should avoid the issue. SMTP4DEV is probably not the only smtp/client software with an issue like this given how undefined/unclear it is.

rnwood commented 6 months ago

PR #1344

rnwood commented 6 months ago

Build 3.3.3-ci20240306100 should now be available.

https://dev.azure.com/rnwood/smtp4dev/_build/results?buildId=2229&view=artifacts&pathAsName=false&type=publishedArtifacts

rnwood commented 6 months ago

Pleased to confirm that I have created https://github.com/rnwood/smtpserver/pull/173 which addresses this issue in the server component. The message will now not be decoded and re-encoded with the wrong encoding in this "just send 8 bit" scenario.

The resulting build then needs to be picked up by Smtp4dev.

rnwood commented 6 months ago

The PR has now been updated with what I think is a quite complete fix. The source and raw tabs have also been adjusted to detect the encoding from the relevant MIME part and transcode to UTF8 for display.

Please note, existing message may display strangely since those have already been transcoded incorrectly.

PR will be merged. Feedback invited for >= 3.3.3-ci20240309103

ballaballaballa commented 6 months ago

Thanks for the quick update! Tested in [3.3.3-ci20240309104]. However, when testing, the characters still won't display correctly in. They are still displayed as in the raw output shown below (the subject line does show åäö correctly though):

From:noreply@test.com Reply-To:noreply@test.com Subject:=?UTF-8?B?VGVzdMOlw6TDtg==?= date: lör 09 mar 2024 20:30 +0000 To:casi MIME-Version:1.0 Content-type:multipart/alternative;boundary="----_=_alt_boundary_1_1710016257"

------_=_alt_boundary_1_1710016257 Content-type:text/plain;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline

Overview: Current Task: New Task 1 Process Name: Mailtest : 1234/A;1-Test Due Date: None Comments: (none) Instructions: Jadå, här testas både å ä och ö. Hoppas att det funkar då! Ê é à ""&

------_=_alt_boundary_1_1710016257 Content-type:text/html;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline

<!DOCTYPE html>
<html>
<head>
<style>
.tableBordered{
border:1px solid Black;
border-collapse:collapse;
padding:2px;
}
</style>
</head>
<body style="font-family:arial">
 <div style="color:#448da6; font-weight:bold; margin-bottom:3px;">Overview:</div>
<table style="font-family:arial">
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Current Task: </td>
<td style="vertical-align: top; text-align:left;">New Task 1 </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Process Name: </td>
<td style="vertical-align: top; text-align:left;">Mailtest : 1234/A;1-Test </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Due Date: </td>
<td style="vertical-align: top; text-align:left;">None </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Comments: </td>
<td style="vertical-align: top; text-align:left;">(none) </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Instructions: </td>
<td style="vertical-align: top; text-align:left;">Jadå, här testas både å ä och ö. Hoppas att det funkar då! Ê é à ""& </td>
</tr>
</table>

</body>
</html>

------_=_alt_boundary_1_1710016257--

rnwood commented 6 months ago

@ballaballaballa Can you share the correct text for the body?

rnwood commented 6 months ago

I've tested åäö with a variety of encodings including just-send-8-bit iso-8859-1 and I can't reproduce it.

If we take this word as an example from what you are seeing:

Jadå

In ISO-8859-1 (which is what the part claims to be encoded as), this is encoded as

à => 0xc3 ¥ => 0xa5

But if we look at UTF-8, 0xc3 0xa5 is å

It should be 0xe5 for ISO-8859-1. So actually, the content of this message is UTF-8 I believe. We need to determine if this is the original client encoding it incorrectly, or if smtp4dev is still doing something wrong.

If you can still reproduce this, would you be able to get a Wireshark trace of the session? This will unambigously show what's going on. Unfortunately, the session log in smtp4dev is text based and we can't see how the chars were encoded as bytes.

ballaballaballa commented 6 months ago

The weird thing is that the Source tab displayed the characters correctly before your fix, while the View tab displayed them same as they do now. I'll try and get a trace.

ballaballaballa commented 6 months ago

So I ran another test and this is the output in raw format. However, when checking in Wireshark, it differs a bit. The line from smtp4dev raw output "date: mån 11 mar 2024 19:58 +0000" is in Wireshark output written as "date: m�n 11 mar 2024 19:58 +0000\r\n". So there smtp4dev shows the correct one. The line "Process Name: !mailtest : 123/A;1-Teståäö" has output "Process Name: !mailtest : 123/A;1-Teståäö\r\n" so there the Wireshark output is the correct one. Same goes for everywhere else where "åäö" is written.

Raw output from smtp4dev: From:test@test.com Reply-To:test@test.com Subject:=?UTF-8?B?dGVzdCDDpcOkw7Y=?= date: mån 11 mar 2024 19:58 +0000 To:test@test.com MIME-Version:1.0 Content-type:multipart/alternative;boundary="----_=_alt_boundary_1_1710187095"

------_=_alt_boundary_1_1710187095 Content-type:text/plain;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline

Overview: Current Task: New Task 1 Process Name: !mailtest : 123/A;1-Teståäö Due Date: None Email From: admin

Comments: comment åäö Instructions: (none)

Attachment:    Name                             Type

This email was sent from Teamcenter.

------_=_alt_boundary_1_1710187095 Content-type:text/html;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline

<!DOCTYPE html>
<html>
<head>
<style>
.tableBordered{
border:1px solid Black;
border-collapse:collapse;
padding:2px;
}
</style>
</head>
<body style="font-family:arial">
 <div style="color:#448da6; font-weight:bold; margin-bottom:3px;">Overview:</div>
<table style="font-family:arial">
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Current Task: </td>
<td style="vertical-align: top; text-align:left;">New Task 1 </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Process Name: </td>
<td style="vertical-align: top; text-align:left;">!mailtest : 123/A;1-Teståäö </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Due Date: </td>
<td style="vertical-align: top; text-align:left;">None </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Email From:  </td>
<td style="vertical-align: top; text-align:left;">admin (casi) </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Comments: </td>
<td style="vertical-align: top; text-align:left;">comment åäö </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Instructions: </td>
<td style="vertical-align: top; text-align:left;">(none) </td>
</tr>
</table>
<br>
<br>
<div style="font-weight:bold; color:#808080;">
This email was sent from Teamcenter.</div>

</body>
</html>

------_=_alt_boundary_1_1710187095--

rnwood commented 5 months ago

@ballaballaballa I'm pretty sure that this shows that your client is sending UTF-8 encoded content but declaring it as ISO-8859-1. I believe Wireshark is assuming UTF-8.

Can you select one of the non ASCII chars in Wireshark and see how they have been encoded as bytes. This will confirm it one way or the other.

I'm tempted to add binary session log to smtp4dev to help with complex issues like this.

ballaballaballa commented 4 months ago

Yes, you are correct. I copied the line as hex and it only shows correctly when converted to utf-8. Here is the line in binary: 00110011001011110100000100111011001100010010110101010100011001010111001101110100110000111010010111000011101001001100001110110110 that shows åäö correctly when converted to utf-8. I will write a bug report to their support. Thanks for the help!

I guess though that Outlook and other clients has support to handle this even though the data is incorrect.

rnwood commented 4 months ago

Thanks for the confirmation. Closing this issue now.