jhillyerd / enmime

MIME mail encoding and decoding package for Go
MIT License
461 stars 100 forks source link

Declared header character encoding cannot always be trusted #338

Open zdiff opened 4 months ago

zdiff commented 4 months ago

I have been receiving emails that have headers with the defined character encoding GB2312. However, it appears as though these headers require the GB18030 character encoding to properly display the decoded headers. Is there a way I can force the header decoder to use GB18030 instead of GB2312?

What I did:

package main

import (
    "fmt"
    "log"
    "strings"

    "github.com/jhillyerd/enmime"
)

func main() {
    header := "Subject: =?GB2312?B?gzD+MIMziTmDMu4zIIMy4zGDNfozoaRIVEMggzW+OIM2jzGDNoU2gzOXNYMw2zEggjfoMII30jM=?="
    parser := enmime.NewParser()
    env, err := parser.ReadEnvelope(strings.NewReader(header))
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(env.GetHeader("Subject"))
}

What I expected: 법원에 애플·HTC 특허합의문 공개

What I got: �0�0�3�9�2�3 �2�1�5�3·HTC �5�8�6�1�6�6�3�5�0�1 �7�0�7�3

Release or branch I am using: v1.2.0

jhillyerd commented 4 months ago

We don't currently have a way to configure that. The quickest fix for you would be to fork and modify rule(s) in https://github.com/jhillyerd/enmime/blob/main/internal/coding/charsets.go

I think adding a parser option to specify custom mappings would make sense, so we can consider this a feature request.