kisli / vmime

VMime Mail Library
http://www.vmime.org
GNU General Public License v3.0
272 stars 110 forks source link

Charset ks_c_5601-1987 not support #216

Open 7i77an opened 5 years ago

7i77an commented 5 years ago

Hi Vincent,

Processing body and subject email, raise error: Error parsing body part. Charset ks_c_5601-1987 is not supported. Continuing without decoding.

My platform is debian and charset conversion library is ICU. I have chance to support from library this charset?

            try {
                vmime::shared_ptr <vmime::charsetConverter> conv = vmime::charsetConverter::create(tp->getCharset(), vmime::charset("utf-8"));
                vmime::shared_ptr <vmime::utility::charsetFilteredOutputStream> partFilteredStream = conv->getFilteredOutputStream(partStreamAdapter);
                content->extract(*partFilteredStream);
                partFilteredStream->flush();
            }
            catch(const vmime::exceptions::charset_conv_error& e) {
                jbodyPart["warning"] = e.what();
                g_logger->Warning("Account::ParseBody: Error parsing body part. Charset %s is not supported. Continuing without decoding", tp->getCharset().getName().c_str());
                content->extract(partStreamAdapter);
            }

Thanks.

vincent-richard commented 5 years ago

Hello!

You could try to map this charset to the actual name in ICU or iconv, it seems it is an alias for "EUC-KR", so try this:

charset::charset(const string& name) ... {
    ...
    if (utility::stringUtils::isStringEqualNoCase(m_name, "ks_c_5601-1987")) {
        m_name = "EUC-KR";
    }
}

I think we really need to do some generic mapping as there might be a lot of alias for a lot of charsets. See a related issue here: https://github.com/php-mime-mail-parser/php-mime-mail-parser/issues/26

7i77an commented 5 years ago

I'ts correct. We need generic mapping base and method to add some custom charset map.

If you do not plan on working on this, please feel free to close the issue.

Thanks a lot Vincent.

7i77an commented 5 years ago

Hi Vincent,

I put this code into charset.ccp:

// Explicitly map alias entries for some charsets
struct CharsetAliasEntry
  {
    CharsetAliasEntry(const string& charset_, const string& alias_)
            : charset(charset_), alias(alias_)
    {
    }

    const string charset;
    const string alias;
  };

CharsetAliasEntry g_charsetAliasMap[] =
{
    CharsetAliasEntry("ascii",              "us-ascii"),
    CharsetAliasEntry("us-ascii",           "us-ascii"),
    CharsetAliasEntry("ansi_x3.4-1968",     "us-ascii"),
    CharsetAliasEntry("646",                "us-ascii"),
    CharsetAliasEntry("iso-8859-1",         "ISO-8859-1"),
    CharsetAliasEntry("iso-8859-2",         "ISO-8859-2"),
    CharsetAliasEntry("iso-8859-3",         "ISO-8859-3"),
    CharsetAliasEntry("iso-8859-4",         "ISO-8859-4"),
    CharsetAliasEntry("iso-8859-5",         "ISO-8859-5"),
    CharsetAliasEntry("iso-8859-6",         "ISO-8859-6"),
    CharsetAliasEntry("iso-8859-6-i",       "ISO-8859-6-I"),
    CharsetAliasEntry("iso-8859-6-e",       "ISO-8859-6-E"),
    CharsetAliasEntry("iso-8859-7",         "ISO-8859-7"),
    CharsetAliasEntry("iso-8859-8",         "ISO-8859-8"),
    CharsetAliasEntry("iso-8859-8-i",       "ISO-8859-8-I"),
    CharsetAliasEntry("iso-8859-8-e",       "ISO-8859-8-E"),
    CharsetAliasEntry("iso-8859-9",         "ISO-8859-9"),
    CharsetAliasEntry("iso-8859-10",        "ISO-8859-10"),
    CharsetAliasEntry("iso-8859-11",        "ISO-8859-11"),
    CharsetAliasEntry("iso-8859-13",        "ISO-8859-13"),
    CharsetAliasEntry("iso-8859-14",        "ISO-8859-14"),
    CharsetAliasEntry("iso-8859-15",        "ISO-8859-15"),
    CharsetAliasEntry("iso-8859-16",        "ISO-8859-16"),
    CharsetAliasEntry("iso-ir-111",         "ISO-IR-111"),
    CharsetAliasEntry("iso-2022-cn",        "ISO-2022-CN"),
    CharsetAliasEntry("iso-2022-cn-ext",    "ISO-2022-CN"),
    CharsetAliasEntry("iso-2022-kr",        "ISO-2022-KR"),
    CharsetAliasEntry("iso-2022-jp",        "ISO-2022-JP"),
    CharsetAliasEntry("utf-16be",           "UTF-16BE"),
    CharsetAliasEntry("utf-16le",           "UTF-16LE"),
    CharsetAliasEntry("utf-16",             "UTF-16"),
    CharsetAliasEntry("windows-1250",       "windows-1250"),
    CharsetAliasEntry("windows-1251",       "windows-1251"),
    CharsetAliasEntry("windows-1252",       "windows-1252"),
    CharsetAliasEntry("windows-1253",       "windows-1253"),
    CharsetAliasEntry("windows-1254",       "windows-1254"),
    CharsetAliasEntry("windows-1255",       "windows-1255"),
    CharsetAliasEntry("windows-1256",       "windows-1256"),
    CharsetAliasEntry("windows-1257",       "windows-1257"),
    CharsetAliasEntry("windows-1258",       "windows-1258"),
    CharsetAliasEntry("ibm866",             "IBM866"),
    CharsetAliasEntry("ibm850",             "IBM850"),
    CharsetAliasEntry("ibm852",             "IBM852"),
    CharsetAliasEntry("ibm855",             "IBM855"),
    CharsetAliasEntry("ibm857",             "IBM857"),
    CharsetAliasEntry("ibm862",             "IBM862"),
    CharsetAliasEntry("ibm864",             "IBM864"),
    CharsetAliasEntry("utf-8",              "UTF-8"),
    CharsetAliasEntry("utf-7",              "UTF-7"),
    CharsetAliasEntry("shift_jis",          "Shift_JIS"),
    CharsetAliasEntry("big5",               "Big5"),
    CharsetAliasEntry("euc-jp",             "EUC-JP"),
    CharsetAliasEntry("euc-kr",             "EUC-KR"),
    CharsetAliasEntry("gb2312",             "GB2312"),
    CharsetAliasEntry("gb18030",            "gb18030"),
    CharsetAliasEntry("viscii",             "VISCII"),
    CharsetAliasEntry("koi8-r",             "KOI8-R"),
    CharsetAliasEntry("koi8_r",             "KOI8-R"),
    CharsetAliasEntry("cskoi8r",            "KOI8-R"),
    CharsetAliasEntry("koi",                "KOI8-R"),
    CharsetAliasEntry("koi8",               "KOI8-R"),
    CharsetAliasEntry("koi8-u",             "KOI8-U"),
    CharsetAliasEntry("tis-620",            "TIS-620"),
    CharsetAliasEntry("t.61-8bit",          "T.61-8bit"),
    CharsetAliasEntry("hz-gb-2312",         "HZ-GB-2312"),
    CharsetAliasEntry("big5-hkscs",         "Big5-HKSCS"),
    CharsetAliasEntry("gbk",                "gbk"),
    CharsetAliasEntry("cns11643",           "x-euc-tw"),

    //#
    //# Netscape private ...
    //#
    CharsetAliasEntry("x-imap4-modified-utf7","x-imap4-modified-utf7"),
    CharsetAliasEntry("x-euc-tw",             "x-euc-tw"),
    CharsetAliasEntry("x-mac-ce",             "x-mac-ce"),
    CharsetAliasEntry("x-mac-turkish",        "x-mac-turkish"),
    CharsetAliasEntry("x-mac-greek",          "x-mac-greek"),
    CharsetAliasEntry("x-mac-icelandic",      "x-mac-icelandic"),
    CharsetAliasEntry("x-mac-croatian",       "x-mac-croatian"),
    CharsetAliasEntry("x-mac-romanian",       "x-mac-romanian"),
    CharsetAliasEntry("x-mac-cyrillic",       "x-mac-cyrillic"),
    CharsetAliasEntry("x-mac-ukrainian",      "x-mac-cyrillic"),
    CharsetAliasEntry("x-mac-hebrew",         "x-mac-hebrew"),
    CharsetAliasEntry("x-mac-arabic",         "x-mac-arabic"),
    CharsetAliasEntry("x-mac-farsi",          "x-mac-farsi"),
    CharsetAliasEntry("x-mac-devanagari",     "x-mac-devanagari"),
    CharsetAliasEntry("x-mac-gujarati",       "x-mac-gujarati"),
    CharsetAliasEntry("x-mac-gurmukhi",       "x-mac-gurmukhi"),
    CharsetAliasEntry("armscii-8",            "armscii-8"),
    CharsetAliasEntry("x-viet-tcvn5712",      "x-viet-tcvn5712"),
    CharsetAliasEntry("x-viet-vps",           "x-viet-vps"),
    CharsetAliasEntry("iso-10646-ucs-2",      "UTF-16BE"),
    CharsetAliasEntry("x-iso-10646-ucs-2-be", "UTF-16BE"),
    CharsetAliasEntry("x-iso-10646-ucs-2-le", "UTF-16LE"),
    CharsetAliasEntry("x-user-defined",       "x-user-defined"),
    CharsetAliasEntry("x-johab",              "x-johab"),

    //#
    //# Aliases for ISO-8859-1
    //#
    CharsetAliasEntry("latin1",               "ISO-8859-1"),
    CharsetAliasEntry("iso_8859-1",           "ISO-8859-1"),
    CharsetAliasEntry("iso8859-1",            "ISO-8859-1"),
    CharsetAliasEntry("iso8859-2",            "ISO-8859-2"),
    CharsetAliasEntry("iso8859-3",            "ISO-8859-3"),
    CharsetAliasEntry("iso8859-4",            "ISO-8859-4"),
    CharsetAliasEntry("iso8859-5",            "ISO-8859-5"),
    CharsetAliasEntry("iso8859-6",            "ISO-8859-6"),
    CharsetAliasEntry("iso8859-7",            "ISO-8859-7"),
    CharsetAliasEntry("iso8859-8",            "ISO-8859-8"),
    CharsetAliasEntry("iso8859-9",            "ISO-8859-9"),
    CharsetAliasEntry("iso8859-10",           "ISO-8859-10"),
    CharsetAliasEntry("iso8859-11",           "ISO-8859-11"),
    CharsetAliasEntry("iso8859-13",           "ISO-8859-13"),
    CharsetAliasEntry("iso8859-14",           "ISO-8859-14"),
    CharsetAliasEntry("iso8859-15",           "ISO-8859-15"),
    CharsetAliasEntry("iso_8859-1:1987",      "ISO-8859-1"),
    CharsetAliasEntry("iso-ir-100",           "ISO-8859-1"),
    CharsetAliasEntry("l1",                   "ISO-8859-1"),
    CharsetAliasEntry("ibm819",               "ISO-8859-1"),
    CharsetAliasEntry("cp819",                "ISO-8859-1"),
    CharsetAliasEntry("csisolatin1",          "ISO-8859-1"),

    //#
    //# Aliases for ISO-8859-2
    //#
    CharsetAliasEntry("latin2",               "ISO-8859-2"),
    CharsetAliasEntry("iso_8859-2",           "ISO-8859-2"),
    CharsetAliasEntry("iso_8859-2:1987",      "ISO-8859-2"),
    CharsetAliasEntry("iso-ir-101",           "ISO-8859-2"),
    CharsetAliasEntry("l2",                   "ISO-8859-2"),
    CharsetAliasEntry("csisolatin2",          "ISO-8859-2"),

    //#
    //# Aliases for ISO-8859-3
    //#
    CharsetAliasEntry("latin3",               "ISO-8859-3"),
    CharsetAliasEntry("iso_8859-3",           "ISO-8859-3"),
    CharsetAliasEntry("iso_8859-3:1988",      "ISO-8859-3"),
    CharsetAliasEntry("iso-ir-109",           "ISO-8859-3"),
    CharsetAliasEntry("l3",                   "ISO-8859-3"),
    CharsetAliasEntry("csisolatin3",          "ISO-8859-3"),

    //#
    //# Aliases for ISO-8859-4
    //#
    CharsetAliasEntry("latin4",               "ISO-8859-4"),
    CharsetAliasEntry("iso_8859-4",           "ISO-8859-4"),
    CharsetAliasEntry("iso_8859-4:1988",      "ISO-8859-4"),
    CharsetAliasEntry("iso-ir-110",           "ISO-8859-4"),
    CharsetAliasEntry("l4",                   "ISO-8859-4"),
    CharsetAliasEntry("csisolatin4",          "ISO-8859-4"),

    //#
    //# Aliases for ISO-8859-5
    //#
    CharsetAliasEntry("cyrillic",             "ISO-8859-5"),
    CharsetAliasEntry("iso_8859-5",           "ISO-8859-5"),
    CharsetAliasEntry("iso_8859-5:1988",      "ISO-8859-5"),
    CharsetAliasEntry("iso-ir-144",           "ISO-8859-5"),
    CharsetAliasEntry("csisolatincyrillic",   "ISO-8859-5"),

    //#
    //# Aliases for ISO-8859-6
    //#
    CharsetAliasEntry("arabic",                "ISO-8859-6"),
    CharsetAliasEntry("iso_8859-6",            "ISO-8859-6"),
    CharsetAliasEntry("iso_8859-6:1987",       "ISO-8859-6"),
    CharsetAliasEntry("iso-ir-127",            "ISO-8859-6"),
    CharsetAliasEntry("ecma-114",              "ISO-8859-6"),
    CharsetAliasEntry("asmo-708",              "ISO-8859-6"),
    CharsetAliasEntry("csisolatinarabic",      "ISO-8859-6"),

    //#
    //# Aliases for ISO-8859-6-I
    //#
    CharsetAliasEntry("csiso88596i",           "ISO-8859-6-I"),

    //#
    //# Aliases for ISO-8859-6-E
    //#
    CharsetAliasEntry("csiso88596e",           "ISO-8859-6-E"),

    //#
    //# Aliases for ISO-8859-7
    //#
    CharsetAliasEntry("greek",                 "ISO-8859-7"),
    CharsetAliasEntry("greek8",                "ISO-8859-7"),
    CharsetAliasEntry("sun_eu_greek",          "ISO-8859-7"),
    CharsetAliasEntry("iso_8859-7",            "ISO-8859-7"),
    CharsetAliasEntry("iso_8859-7:1987",       "ISO-8859-7"),
    CharsetAliasEntry("iso-ir-126",            "ISO-8859-7"),
    CharsetAliasEntry("elot_928",              "ISO-8859-7"),
    CharsetAliasEntry("ecma-118",              "ISO-8859-7"),
    CharsetAliasEntry("csisolatingreek",       "ISO-8859-7"),

    //#
    //# Aliases for ISO-8859-8
    //#
    CharsetAliasEntry("hebrew",                "ISO-8859-8"),
    CharsetAliasEntry("iso_8859-8",            "ISO-8859-8"),
    CharsetAliasEntry("visual",                "ISO-8859-8"),
    CharsetAliasEntry("iso_8859-8:1988",       "ISO-8859-8"),
    CharsetAliasEntry("iso-ir-138",            "ISO-8859-8"),
    CharsetAliasEntry("csisolatinhebrew",      "ISO-8859-8"),

    //#
    //# Aliases for ISO-8859-8-I
    //#
    CharsetAliasEntry("csiso88598i",           "ISO-8859-8-I"),
    CharsetAliasEntry("iso-8859-8i",           "ISO-8859-8-I"),
    CharsetAliasEntry("logical",               "ISO-8859-8-I"),

    //#
    //# Aliases for ISO-8859-8-E
    //#
    CharsetAliasEntry("csiso88598e",           "ISO-8859-8-E"),

    //#
    //# Aliases for ISO-8859-9
    //#
    CharsetAliasEntry("latin5",                "ISO-8859-9"),
    CharsetAliasEntry("iso_8859-9",            "ISO-8859-9"),
    CharsetAliasEntry("iso_8859-9:1989",       "ISO-8859-9"),
    CharsetAliasEntry("iso-ir-148",            "ISO-8859-9"),
    CharsetAliasEntry("l5",                    "ISO-8859-9"),
    CharsetAliasEntry("csisolatin5",           "ISO-8859-9"),

    //#
    //# Aliases for UTF-8
    //#
    CharsetAliasEntry("unicode-1-1-utf-8",     "UTF-8"),

    //# nl_langinfo(CODESET) in HP/UX returns 'utf8' under UTF-8 locales
    CharsetAliasEntry("utf8",                  "UTF-8"),

    //#
    //# Aliases for Shift_JIS
    //#
    CharsetAliasEntry("x-sjis",                "Shift_JIS"),
    CharsetAliasEntry("shift-jis",             "Shift_JIS"),
    CharsetAliasEntry("ms_kanji",              "Shift_JIS"),
    CharsetAliasEntry("csshiftjis",            "Shift_JIS"),
    CharsetAliasEntry("windows-31j",           "Shift_JIS"),
    CharsetAliasEntry("cp932",                 "Shift_JIS"),
    CharsetAliasEntry("sjis",                  "Shift_JIS"),

    //#
    //# Aliases for EUC_JP
    //#
    CharsetAliasEntry("cseucpkdfmtjapanese",   "EUC-JP"),
    CharsetAliasEntry("x-euc-jp",              "EUC-JP"),

    //#
    //# Aliases for ISO-2022-JP
    //#
    CharsetAliasEntry("csiso2022jp",           "ISO-2022-JP"),

    //# The following are really not aliases ISO-2022-JP, but sharing the same decoder
    CharsetAliasEntry("iso-2022-jp-2",         "ISO-2022-JP"),
    CharsetAliasEntry("csiso2022jp2",          "ISO-2022-JP"),

    //#
    //# Aliases for Big5
    //#
    CharsetAliasEntry("csbig5",                "Big5"),
    CharsetAliasEntry("cn-big5",               "Big5"),

    //# x-x-big5 is not really a alias for Big5, add it only for MS FrontPage
    CharsetAliasEntry("x-x-big5",              "Big5"),

    //# Sun Solaris
    CharsetAliasEntry("zh_tw-big5",            "Big5"),

    //#
    //# Aliases for EUC-KR
    //#
    CharsetAliasEntry("cseuckr",               "EUC-KR"),
    CharsetAliasEntry("ks_c_5601-1987",        "EUC-KR"),
    CharsetAliasEntry("iso-ir-149",            "EUC-KR"),
    CharsetAliasEntry("cseuckr",               "EUC-KR"),
    CharsetAliasEntry("ks_c_5601",             "EUC-KR"),
    CharsetAliasEntry("ksc_5601",              "EUC-KR"),
    CharsetAliasEntry("ksc5601",               "EUC-KR"),
    CharsetAliasEntry("csksc56011987",         "EUC-KR"),
    CharsetAliasEntry("5601",                  "EUC-KR"),

    //#
    //# Aliases for GB2312
    //#
    //# The following are really not aliases GB2312, add them only for MS FrontPage
    CharsetAliasEntry("gb_2312-80",            "GB2312"),
    CharsetAliasEntry("iso-ir-58",             "GB2312"),
    CharsetAliasEntry("chinese",               "GB2312"),
    CharsetAliasEntry("csiso58gb231280",       "GB2312"),
    CharsetAliasEntry("csgb2312",              "GB2312"),
    CharsetAliasEntry("zh_cn.euc",             "GB2312"),

    //# Sun Solaris
    CharsetAliasEntry("gb_2312",               "GB2312"),

    //#
    //# Aliases for windows-125x 
    //#
    CharsetAliasEntry("x-cp1250",              "windows-1250"),
    CharsetAliasEntry("x-cp1251",              "windows-1251"),
    CharsetAliasEntry("x-cp1252",              "windows-1252"),
    CharsetAliasEntry("x-cp1253",              "windows-1253"),
    CharsetAliasEntry("x-cp1254",              "windows-1254"),
    CharsetAliasEntry("x-cp1255",              "windows-1255"),
    CharsetAliasEntry("x-cp1256",              "windows-1256"),
    CharsetAliasEntry("x-cp1257",              "windows-1257"),
    CharsetAliasEntry("x-cp1258",              "windows-1258"),

    //#
    //# Aliases for windows-874 
    //#
    CharsetAliasEntry("windows-874",           "windows-874"),
    CharsetAliasEntry("ibm874",                "windows-874"),
    CharsetAliasEntry("dos-874",               "windows-874"),

    //#
    //# Aliases for macintosh
    //#
    CharsetAliasEntry("macintosh",             "macintosh"),
    CharsetAliasEntry("x-mac-roman",           "macintosh"),
    CharsetAliasEntry("mac",                   "macintosh"),
    CharsetAliasEntry("csmacintosh",           "macintosh"),

    //#
    //# Aliases for IBM866
    //#
    CharsetAliasEntry("cp866",                 "IBM866"),
    CharsetAliasEntry("cp-866",                "IBM866"),
    CharsetAliasEntry("866",                   "IBM866"),
    CharsetAliasEntry("csibm866",              "IBM866"),

    //#
    //# Aliases for IBM850
    //#
    CharsetAliasEntry("cp850",                 "IBM850"),
    CharsetAliasEntry("850",                   "IBM850"),
    CharsetAliasEntry("csibm850",              "IBM850"),

    //#
    //# Aliases for IBM852
    //#
    CharsetAliasEntry("cp852",                 "IBM852"),
    CharsetAliasEntry("852",                   "IBM852"),
    CharsetAliasEntry("csibm852",              "IBM852"),

    //#
    //# Aliases for IBM855
    //#
    CharsetAliasEntry("cp855",                 "IBM855"),
    CharsetAliasEntry("855",                   "IBM855"),
    CharsetAliasEntry("csibm855",              "IBM855"),

    //#
    //# Aliases for IBM857
    //#
    CharsetAliasEntry("cp857",                 "IBM857"),
    CharsetAliasEntry("857",                   "IBM857"),
    CharsetAliasEntry("csibm857",              "IBM857"),

    //#
    //# Aliases for IBM862
    //#
    CharsetAliasEntry("cp862",                 "IBM862"),
    CharsetAliasEntry("862",                   "IBM862"),
    CharsetAliasEntry("csibm862",              "IBM862"),

    //#
    //# Aliases for IBM864
    //#
    CharsetAliasEntry("cp864",                 "IBM864"),
    CharsetAliasEntry("864",                   "IBM864"),
    CharsetAliasEntry("csibm864",              "IBM864"),
    CharsetAliasEntry("ibm-864",               "IBM864"),

    //#
    //# Aliases for T.61-8bit
    //#
    CharsetAliasEntry("t.61",                  "T.61-8bit"),
    CharsetAliasEntry("iso-ir-103",            "T.61-8bit"),
    CharsetAliasEntry("csiso103t618bit",       "T.61-8bit"),

    //#
    //# Aliases for UTF-7
    //#
    CharsetAliasEntry("x-unicode-2-0-utf-7",   "UTF-7"),
    CharsetAliasEntry("unicode-2-0-utf-7",     "UTF-7"),
    CharsetAliasEntry("unicode-1-1-utf-7",     "UTF-7"),
    CharsetAliasEntry("csunicode11utf7",       "UTF-7"),

    //#
    //# Aliases for ISO-10646-UCS-2
    //#
    CharsetAliasEntry("csunicode",                "UTF-16BE"),
    CharsetAliasEntry("csunicode11",              "UTF-16BE"),
    CharsetAliasEntry("iso-10646-ucs-basic",      "UTF-16BE"),
    CharsetAliasEntry("csunicodeascii",           "UTF-16BE"),
    CharsetAliasEntry("iso-10646-unicode-latin1", "UTF-16BE"),
    CharsetAliasEntry("csunicodelatin1",          "UTF-16BE"),
    CharsetAliasEntry("iso-10646",                "UTF-16BE"),
    CharsetAliasEntry("iso-10646-j-1",            "UTF-16BE"),

    //#
    //# Aliases for ISO-8859-10
    //#
    CharsetAliasEntry("latin6",                   "ISO-8859-10"),
    CharsetAliasEntry("iso-ir-157",               "ISO-8859-10"),
    CharsetAliasEntry("l6",                       "ISO-8859-10"),

    //# Currently .properties cannot handle : in key
    //#iso_8859-10:1992","ISO-8859-10
    CharsetAliasEntry("csisolatin6",              "ISO-8859-10"),

    //#
    //# Aliases for ISO-8859-15
    //#
    CharsetAliasEntry("iso_8859-15",              "ISO-8859-15"),
    CharsetAliasEntry("csisolatin9",              "ISO-8859-15"),
    CharsetAliasEntry("l9",                       "ISO-8859-15"),

    //#
    //# Aliases for ISO-IR-111
    //#
    CharsetAliasEntry("ecma-cyrillic",            "ISO-IR-111"),
    CharsetAliasEntry("csiso111ecmacyrillic",     "ISO-IR-111"),

    //#
    //# Aliases for ISO-2022-KR
    //#
    CharsetAliasEntry("csiso2022kr",              "ISO-2022-KR"),

    //#
    //# Aliases for VISCII
    //#
    CharsetAliasEntry("csviscii",                 "VISCII"),

    //#
    //# Aliases for x-euc-tw
    //#
    CharsetAliasEntry("zh_tw-euc",                "x-euc-tw"),

    //#
    //# Following names appears in unix nl_langinfo(CODESET)
    //# They can be compiled as platform specific if necessary
    //# DONT put things here if it does not look generic enough (like hp15CN)
    //#
    CharsetAliasEntry("iso88591",                 "ISO-8859-1"),
    CharsetAliasEntry("iso88592",                 "ISO-8859-2"),
    CharsetAliasEntry("iso88593",                 "ISO-8859-3"),
    CharsetAliasEntry("iso88594",                 "ISO-8859-4"),
    CharsetAliasEntry("iso88595",                 "ISO-8859-5"),
    CharsetAliasEntry("iso88596",                 "ISO-8859-6"),
    CharsetAliasEntry("iso88597",                 "ISO-8859-7"),
    CharsetAliasEntry("iso88598",                 "ISO-8859-8"),
    CharsetAliasEntry("iso88599",                 "ISO-8859-9"),
    CharsetAliasEntry("iso885910",                "ISO-8859-10"),
    CharsetAliasEntry("iso885911",                "ISO-8859-11"),
    CharsetAliasEntry("iso885912",                "ISO-8859-12"),
    CharsetAliasEntry("iso885914",                "ISO-8859-14"),
    CharsetAliasEntry("iso885913",                "ISO-8859-13"),
    CharsetAliasEntry("iso885915",                "ISO-8859-15"),
    //#
    CharsetAliasEntry("tis620",                   "TIS-620"),
    //#
    CharsetAliasEntry("cp1250",                   "windows-1250"),
    CharsetAliasEntry("cp1251",                   "windows-1251"),
    CharsetAliasEntry("cp1252",                   "windows-1252"),
    CharsetAliasEntry("cp1253",                   "windows-1253"),
    CharsetAliasEntry("cp1254",                   "windows-1254"),
    CharsetAliasEntry("cp1255",                   "windows-1255"),
    CharsetAliasEntry("cp1256",                   "windows-1256"),
    CharsetAliasEntry("cp1257",                   "windows-1257"),
    CharsetAliasEntry("cp1258",                   "windows-1258"),

    CharsetAliasEntry("x-gbk",                    "gbk"),
    CharsetAliasEntry("windows-936",              "gbk"),
    CharsetAliasEntry("ansi-1251",                "windows-1251"),

};                                

void charset::setAliasCharset()
{
    const string cset = utility::stringUtils::toLower(m_name);

    for (unsigned int i = 0 ; i < (sizeof(g_charsetAliasMap) / sizeof(g_charsetAliasMap[0])) - 1 ; ++i)
    {
        if (cset.find(g_charsetAliasMap[i].charset) != string::npos)
        {
            m_name = g_charsetAliasMap[i].alias;
            break;
        }
    }
}

And call setAliasCharset replaced utf-7 check:

charset::charset(const string& name)
    : m_name(name)
{
    setAliasCharset();
}

void charset::parseImpl
(const parsingContext& /* ctx */, const string& buffer, const size_t position,
 const size_t end, size_t* newPosition)
 {
m_name = utility::stringUtils::trim
    (string(buffer.begin() + position, buffer.begin() + end));

    setAliasCharset();

setParsedBounds(position, end);

if (newPosition)
    *newPosition = end;
 }

charset.hpp:

 private:
    void setAliasCharset();

   string m_name;

I hope you find it useful...

vincent-richard commented 5 years ago

Hello! I need to check license issues caused by incorporating a MPL-covered file (or any other file, or even data coming from these files) into VMime (which is dual-licensed, including GPL).

jengelh commented 5 years ago

No code import is needed methinks. The IANA mapping list is https://www.iana.org/assignments/character-sets/character-sets.xhtml from which alias calls can be derived/written by oneself.

jstedfast commented 5 years ago

FWIW, I can confirm that this maps to EUC-KR. I've had this mapping in use by all of the MIME libraries I've written over the past 2 decades (Evolution, MimeKit, and GMime.

Hope that helps.

BTW, I always recommend VMime to anyone who asks me about MIME libraries for c++ :-)