saintfish / chardet

Charset detector library for golang derived from ICU
Other
348 stars 63 forks source link

ISO-8859-2 , GB-18030 and ISO-2022 #7

Open Aoi-hosizora opened 3 years ago

Aoi-hosizora commented 3 years ago
  1. newRecognizer_8859_2_xx function should return newRecognizer_8859_2(xx) which uses ISO-8859-2 rather than newRecognizer_8859_1(xx) which uses ISO-8859-1.

https://github.com/saintfish/chardet/blob/3af4cd4741ca4f3eb0c407c034571a6fb0ea529c/single_byte.go#L325-L336

  1. As #2 and #3 says, GB18030 should be GB18030.

https://github.com/saintfish/chardet/blob/3af4cd4741ca4f3eb0c407c034571a6fb0ea529c/multi_byte.go#L340

  1. ISO-2022-XX charsets language should be ja, ko and cn

https://github.com/saintfish/chardet/blob/3af4cd4741ca4f3eb0c407c034571a6fb0ea529c/2022.go#L83-L101