Closed gungora closed 1 year ago
The discrepancy you describe is not at all expected.
I just added the following test case to MimeKit's unit tests to verify that MimeKit does the right thing (and it does):
[Test]
public void TestIssue883 ()
{
const string rawMessageText = @"From: John Doe <jdoe@machine.example>
To: Mary Smith <mary@example.net>
Subject: =?GB18030?B?1qTD9w==?=
Date: Fri, 21 Nov 1997 09:55:06 -0600
Message-ID: <1234@local.machine.example>
This is a message just to say hello.
So, ""Hello"".";
using (var source = new MemoryStream (Encoding.UTF8.GetBytes (rawMessageText))) {
var message = MimeMessage.Load (source);
Assert.AreEqual ("证明", message.Subject);
}
}
The test passes.
Ah, I bet I know what the problem is that you are hitting.
You probably forgot to call:
System.Text.Encoding.RegisterProvider (CodePagesEncodingProvider.Instance);
You need to call that before making any calls to MimeKit or MailKit.
MimeKit initializes a charset mapping when the first call to MimeKit is made and if it can't find a charset, it maps the charset name to iso-8859-1 (because that is always available).
That was it 😊 I believe MimeKit used to call Encoding.RegisterProvider() itself in CharsetUtils, but it looks like this changed back in July. Thanks for bringing this up—we will call it ourselves now.
Hello,
Let's say I have a MIME message as follows:
If I access the subject of the message as follows:
I get the following result:
Ö¤Ã÷
, which appears correctly when viewed using the GB18030 charset.On the other hand, if I access it as follows:
var subject = message.Headers.First(x => x.Id == HeaderId.Subject).GetValue(Encoding.GetEncoding("GB18030"));
... then I get
证明
, which appears to be the UTF-8 encoded version of the subject.A couple of questions:
Is the above discrepancy expected? Since the charset is specified in the RFC 2047-encoded subject, I was thinking that overriding the encoding by calling the GetValue() method with the same encoding would not yield a different result.
Getting the header in UTF-8 form works better for me. If I need to call the GetValue() method with the corresponding encoding to do that, how would I go about determining the charset of the RFC 2047-encoded header? I do not see that information attached to the header itself.
Many thanks!