mnako / letters

Letters, or how to parse emails in Go
MIT License
46 stars 9 forks source link

decodeHeader panics on label = windows-874 #49

Closed nikonov1101 closed 1 year ago

nikonov1101 commented 1 year ago

Hello!

I've got a panic here. CharsetReader got "windows-874" as label, turns it into "cp874" and go lookup. But the correct name for this encoding is "windows-874" (according to "htmlindex" package, at least). So enc is nil, so a call to enc.NewDecoder() panics.

suggestions for a fix are:

  1. do charset.Lookup the strings.replce-d version of a label, if no encoder found (enc == nil) - try again with non-replaced version of a label, also add a check for a nil encoder as well (flexible solution);
  2. just check for if enc == nil and return "cannot find MIME-word-encoded for label" error (strict solution);
  3. special case for windows-784 encoding (bad solution);

If we agree on the solution, I could do a PR with fixes.

Thanks!

mnako commented 1 year ago

@nikonov1101 , thank you for opening this issue, what an exciting find!

I will look into this very soon. A quick search suggests that tis-620 and iso-8859-11 are two Thai character encoding standards, while windows-874, ibm874, x-mac-thai, and tactis are some non-standard encodings referring to code page 874.

I will first make sure that my understanding is correct and add some test cases for Thai text so that we can add proper support, if possible, and definitely better error handling.

mnako commented 1 year ago

This definitely does not count as "I will look into this very soon.” but I got busy at work, apologies for that. My pace of open-source contributions should be back to normal now.

Fixed in https://github.com/mnako/letters/pull/54 and released in https://github.com/mnako/letters/releases/tag/v0.2.1.