Closed smikitky closed 5 years ago
Okay, it turned out that EUC-KR and ISO-2022-KR are almost identical except that the latter implicitly invokes multibyte characters into the G1 area whereas the latter requires you to explicitly invoke them using the four byte sequence ESC $ ) C
. This means we can simply remove the escape sequence and treat it as a string in EUC-KR.
Fixed in 1.2.0
Currently this extension cannot handle Korean characters (Hangul). What DICOM standard says is as follows:
ESC 02/04 02/08 04/03
01/11 02/04 02/09 04/03
orESC $ ) C
(i.e., "Use Hangul from here")The problems is that ISO-2022-KR is a very rare encoding and I cannot find a pure-JS decoder for that. Seemingly EUC-KR is very similar, and CP949 is a superset of EUC-KR. A hacky solution would be to just decode the text with CP949 and remove the 4-byte escape sequence using regex. This works at least in this example, but I don't know if it's the right approach.