Open intr3p1d opened 4 months ago
clp-ffi-java internally use StandardCharsets.ISO_8859_1 in EncodedMessage.getLogTypeAsString(); https://github.com/y-scope/clp-ffi-java/blob/c4a74dbdeb09bd4e7e3d119826dddbe5005ccf53/src/main/java/com/yscope/clp/compressorfrontend/EncodedMessage.java#L30-L36 (getDictionaryVarsAsStrings also)
clp-ffi-java
StandardCharsets.ISO_8859_1
EncodedMessage.getLogTypeAsString();
getDictionaryVarsAsStrings
https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-plugins/pinot-input-format/pinot-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java#L151-L154
This makes some characters broken like this: Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다 into Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: ì´ìì´ì´ì¼ í©ëë¤
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: ì´ìì´ì´ì¼ í©ëë¤
This is fine after going through the decode function, but when dealing with individual logtype, these broken strings don't seem appropriate (LIKE searches, etc).
0.4.4
Linux, Java https://github.com/apache/pinot/blob/1d490c1ac3268103a16d77ddfa70f8f8602f9e96/pom.xml#L160
Encode some characters which is not supported by ISO_8859_1 Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다 Then get the logtype Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: ì´ìì´ì´ì¼ í©ëë¤
ISO_8859_1
Bug
Cause
clp-ffi-java
internally useStandardCharsets.ISO_8859_1
inEncodedMessage.getLogTypeAsString();
https://github.com/y-scope/clp-ffi-java/blob/c4a74dbdeb09bd4e7e3d119826dddbe5005ccf53/src/main/java/com/yscope/clp/compressorfrontend/EncodedMessage.java#L30-L36 (getDictionaryVarsAsStrings
also)Effect
https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-plugins/pinot-input-format/pinot-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java#L151-L154
This makes some characters broken like this:
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다
intoRequest processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: ì´ìì´ì´ì¼ í©ëë¤
This is fine after going through the decode function, but when dealing with individual logtype, these broken strings don't seem appropriate (LIKE searches, etc).
clp-ffi version
0.4.4
Environment
Linux, Java https://github.com/apache/pinot/blob/1d490c1ac3268103a16d77ddfa70f8f8602f9e96/pom.xml#L160
Reproduction steps
Encode some characters which is not supported by
ISO_8859_1
Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: /u0011 이상이어야 합니다
Then get the logtypeRequest processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: ì´ìì´ì´ì¼ í©ëë¤