Closed nikonov1101 closed 1 year ago
@nikonov1101 , thank you for opening this issue, what an exciting find!
I will look into this very soon. A quick search suggests that tis-620
and iso-8859-11
are two Thai character encoding standards, while windows-874
, ibm874
, x-mac-thai
, and tactis
are some non-standard encodings referring to code page 874.
I will first make sure that my understanding is correct and add some test cases for Thai text so that we can add proper support, if possible, and definitely better error handling.
This definitely does not count as "I will look into this very soon.” but I got busy at work, apologies for that. My pace of open-source contributions should be back to normal now.
Fixed in https://github.com/mnako/letters/pull/54 and released in https://github.com/mnako/letters/releases/tag/v0.2.1.
Hello!
I've got a panic here.
CharsetReader
got "windows-874" as label, turns it into "cp874" and go lookup. But the correct name for this encoding is "windows-874" (according to "htmlindex" package, at least). Soenc
is nil, so a call toenc.NewDecoder()
panics.suggestions for a fix are:
charset.Lookup
the strings.replce-d version of a label, if no encoder found (enc == nil) - try again with non-replaced version of a label, also add a check for a nil encoder as well (flexible solution);windows-784
encoding (bad solution);If we agree on the solution, I could do a PR with fixes.
Thanks!