The root case is the inconsistent decoding for GBK encoded data when converting it to utf-8. The hive UDF calls JDK lib to do the decoding with utf-8 decoder. In our columnar UDF implementation, the decoding is GBK decoding like. It is implementation-dependent in decoding non-utf-8 data.
The root case is the inconsistent decoding for GBK encoded data when converting it to utf-8. The hive UDF calls JDK lib to do the decoding with utf-8 decoder. In our columnar UDF implementation, the decoding is GBK decoding like. It is implementation-dependent in decoding non-utf-8 data.