nodemailer / mailparser

Decode mime formatted e-mails
Other
1.59k stars 281 forks source link

If the subject is not UTF8 encoded, garbled characters will appear after parsing #351

Closed xigirl closed 9 months ago

xigirl commented 9 months ago

The issue originates from # 348, and the following is only the email source code that I saw using Foxmail. It may have been partially processed. The email header I pulled using the node-pop3 library is of the buffer type, such as<Buffer 53 75 62 6a 65 63 74 3a 20 5b b3 b7 bb d8 d3 ca bc fe b3 c9 b9 a6 5d 20 b3 b7 bb d8 d3 ca bc fe b2 e2 ca>.Before parsing, I do not know what format of encoding it is. How should I handle it?

Subject:发信方已撤回邮件:测试测试
X-QQ-mid: tyyjxt-xx11d002-yh15wt16855880
Date:Thu, 1 Jun 2023 10:53:30 +0800
Content-Type: multipart/mixed;
boundary="----=_NextPart_45518C5C_082708D8_5A03221D"
andris9 commented 9 months ago

Unfortunately, there are no good options. The only valid character set to be used in message headers is UTF-8 (see RFC6532). For older mailing systems that do not support RFC6532 and use different character sets, the email client UI has historically been using heuristics. It can use the user's locale if it detects an unknown character set. If the user has a Chinese locale, it can assume the character set used is a GB* or Big5, or if the user has a Russian locale, it can assume it is KOI8, and so on. Mailparser, on the other hand, has no knowledge about the user (as it is a server-side component, not a desktop app), so it can't really assume anything.

The only real option you have is to detect if the email is from Foxmail and then convert the entire email from the Chinese charset to UTF-8, and only then feed it to Mailparser for parsing.