Open yjhong79 opened 1 week ago
Just to check, while narrow is active the ODBC standard suggestes that the client side encoding is used. On Windows this is usually something like latin1. On Linux most drivers seemed to ignore the standard and just use Utf8. On the other hand many Linux drivers struggle to implement Utf16 correctly.
I wonder if this time the Oracle driver sticks to the standard. Could you test if the system locale suggests to use UTF8? Is the something you can pass in the connection string to set the client encoding?
I am sympathetic to your cause though and will investigate how to best forward the decision to the user, while keeping the defaults intact.
I have tested several SQL tools, including DBeaver, unixODBC's isql and iusql, Rust's odbc-api, C++'s nanoodbc, and others.
Based on my test results, it appears that Oracle does not perform character encoding conversion when creating tables or fetching column names that contain characters outside the ASCII range.
When I create a column named "컬럼" (which means "Column" in Korean) in DBeaver and execute SELECT "컬럼" FROM TEST_TABLE, it works correctly. However, other tools fail to recognize "컬럼". DBeaver uses JDBC with UTF16 encoding.
Conversely, if I create a column with UTF8 encoding using other tools, DBeaver cannot recognize "컬럼".
The hexadecimal representation of the column name in each encoding is as follows: UTF8: ec bb ac eb 9f bc UTF16: ceec b7fc
In DBeaver, when creating a table, the execution results vary depending on the feature used:
With odbc-api and the narrow feature enabled, only part of the UTF16 data is retrieved:
Column ColumnDescription { name: [ec, fc], data_type: Varchar { length: Some(64) }, nullability: Nullable }
With odbc-api and the without narrow feature, the full UTF16 data is retrieved, and the string is successfully constructed:
Column ColumnDescription { name: [ceec, b7fc], data_type: Varchar { length: Some(64) }, nullability: Nullable }
When creating a table with UTF8 encoding using iusql, the execution results are:
Column ColumnDescription { name: [ec, bb, ac, eb, 9f, bc], data_type: Varchar { length: Some(64) }, nullability: Nullable }
Column ColumnDescription { name: [ffec, ffbb, ffac, ffeb, ff9f, ffbc], data_type: Varchar { length: Some(64) }, nullability: Nullable }
This error occurs only when retrieving column names using SQLDescribeCol. When Unicode data is included in table rows, it works correctly
Since tools using JDBC are commonly used and they handle UTF16-encoded column names (like those in Korean), I believe it would be beneficial to have the ability to enable or disable the narrow feature in odbc-api to correctly retrieve such column names.
Hello @yjhong79 ,
arrow-odbc 14.0.0
is released. It supports the wide
feature flag. It can be used on non-windows platforms to use the wide UTF-16 version of the ODBC function calls.
Let me know if it works for you.
Best, Markus
When using arrow-odbc, we are unable to retrieve column names containing Unicode characters from Oracle databases if the narrow feature is enabled in the odbc-api crate (which arrow-odbc depends on). This is because the narrow feature in odbc-api uses UTF-8 encoding, but Oracle's ODBC driver expects UTF-16 encoding for proper Unicode support.
Please add a narrow feature flag to arrow-odbc that allows users to enable or disable the narrow feature in the odbc-api dependency. This will give users control over the encoding used and ensure that Unicode column names are retrieved correctly from databases like Oracle that require UTF-16 encoding.
Request: