mnako / letters

Letters, or how to parse emails in Go
MIT License
46 stars 9 forks source link

[ISSUE-49] Fix panic on `windows-874` (and other correct `windows-`) labels #54

Closed mnako closed 1 year ago

mnako commented 1 year ago

Description

As reported by @nikonov1101 in https://github.com/mnako/letters/issues/49, CharsetReader was not performing a correct lookup for labels that do start with windows-.

This PR uses Thai language and encodings to illustrate the problem, add test cases, and fix the bug by first attempting a lookup of the original label, then (if not found) attempting a lookup of the normalised label, and (if not found again) raising an informative error. Assuming that most labels are correct and do not need the string replace, this should be the fastest approach.

Commits:

  1. Tests should fail on: Add test cases for iso-8859-11, windows-874, and tis-620 encoding using Thai as an example;
  2. Fix: modify decoders.decodeHeader.CharsetReader to lookup the original label, replace windows- with cp only if not found, and raise informative error, if not found again and show all test cases passing again.