Can't insert/select NVARCHAR2 column with russian and chinese strings

AndreaPuccia commented 1 year ago

Can't insert and select NVARCHAR2 column with russian and chinese strings.

I tried to use python, encoding/nencoding parameters but i can't insert/read NVARCHAR2 column with russian and chinese strings.

These are my settings:

Platform: Linux-5.4.0-131-generic-x86_64-with-glibc2.28
Python version: 3.9.9
Maxsize > 2**32: True
Cx-Oracle version: 8.2.1
Oracle client version: (19, 11, 0, 0, 0)
Oracle db version: 11.2.0.1.0

DB

CREATE TABLE "DI_TEST_TABLE"
   (    
   "LAYER" NVARCHAR2(255), 
   "LINGUA" VARCHAR2(255), 
   "DES" NUMBER(*,0)
   )

SELECT * FROM NLS_DATABASE_PARAMETERS
WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET')

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET               WE8MSWIN1252
NLS_NCHAR_CHARACTERSET         AL16UTF16

INSERT INTO DI_TEST_TABLE VALUES (n'консэквюат', 'russian', 1)
INSERT INTO DI_TEST_TABLE VALUES (n'面料', 'chinese', 2)

SELECT * FROM DI_TEST_TABLE

LAYER          LINGUA     DES
-------------- ---------- -------
консэквюат     russian    1
面料            chinese    2

SELECT * FROM DI_TEST_TABLE WHERE LAYER = n'консэквюат'

LAYER          LINGUA     DES
-------------- ---------- -------
консэквюат     russian    1

PYTHON

import cx_Oracle

conn_config =  {'user': 'my_user', 'password': 'my_password', 'dsn': 'my_dsn', 'encoding': 'UTF-8', 'nencoding': 'UTF-8'}

connection = cx_Oracle.connect(**conn_config)
cursor = connection.cursor()

# INSERT INTO DB
insert_query = "INSERT INTO DI_TEST_TABLE VALUES (:1, :2, :3)"
chunk = [('консэквюат', 'russian', 3),('面料', 'chinese', 4)]

cursor.executemany(insert_query, chunk)
connection.commit()

select_query = "SELECT * FROM DI_TEST_TABLE"
cursor.execute(select_query)
result = cursor.fetchall()
print(result)

# READ 
select_query = "SELECT * FROM DI_TEST_TABLE WHERE LAYER = 'консэквюат'"
cursor.execute(select_query)
result = cursor.fetchall()
print(result)

select_query = "SELECT * FROM DI_TEST_TABLE WHERE LAYER = n'консэквюат'"
cursor.execute(select_query)
result = cursor.fetchall()
print(result)

connection.close()

This is python script output. When I insert the records with python on the db the question marks (¿) are inserted. Similarly, when I execute the select, the correct records are not retrieved but those with question marks (¿).

[('консэквюат', 'russian', 1), ('面料', 'chinese', 2), ('¿¿¿¿¿¿¿¿¿¿', 'russian', 3), ('¿¿', 'chinese', 4)]
[('¿¿¿¿¿¿¿¿¿¿', 'russian', 3)]
[('¿¿¿¿¿¿¿¿¿¿', 'russian', 3)]

I tried to use binding params and calling cursor.setinputsizes(), but I need to execute queries without binding params because queries are read from file and chinese/russian strings are embedded into queries. Is there a way to execute these queries correctly?

anthony-tuininga commented 1 year ago

What did you use for cursor.setinputsizes()? You have to do this if you want to bind NCHAR or NVARCHAR data as Oracle performs a conversion to the database character set (WE8MSWIN1252) before converting it a second time to the national character set (AL16UTF16):

# INSERT INTO DB
insert_query = "INSERT INTO DI_TEST_TABLE VALUES (:1, :2, :3)"
chunk = [('консэквюат', 'russian', 3),('面料', 'chinese', 4)]

cursor.setinputsizes(cx_Oracle.DB_TYPE_NVARCHAR)
cursor.executemany(insert_query, chunk)
connection.commit()

cjbj commented 1 year ago

I've had a ToDo item for a while to improve doc on this.

AndreaPuccia commented 1 year ago

I use cursor.setinputsizes() with cx_Oracle.DB_TYPE_NVARCHAR and I can insert and query correctly.

insert_query = "INSERT INTO DI_TEST_TABLE VALUES (:1, :2, :3)"
chunk = [('консэквюат', 'russian', 3),('面料', 'chinese', 4)]

# INSERT INTO DB
cursor.setinputsizes(cx_Oracle.DB_TYPE_NVARCHAR, cx_Oracle.DB_TYPE_VARCHAR, cx_Oracle.DB_TYPE_NUMBER)
cursor.executemany(insert_query, chunk)
connection.commit()

# READ DB
select_query_param = "SELECT * FROM DI_TEST_TABLE WHERE LAYER = :1"
param = 'консэквюат'
cursor.setinputsizes(cx_Oracle.DB_TYPE_NVARCHAR)
cursor.execute(select_query_param, (param,))
result = cursor.fetchall()
print(result)

But I have to execute queries which are contained within a txt file so I can't use param binding because chinese/russian characters are embedded within the queries.

Here are some examples of queries that I can find inside the txt file.

INSERT INTO DI_TEST_TABLE VALUES (n'консэквюат', 'russian', 1)
SELECT * FROM DI_TEST_TABLE WHERE LAYER = n'консэквюат'

anthony-tuininga commented 1 year ago

The only way you can run those sorts of queries is if you change the database character set to be one that is capable of encoding that data; otherwise, you have no choice but to use bind variables and call cursor.setinputsizes() as shown. It is better to use bind variables anyway -- for performance, safety, etc.

NOTE: it is unknown (to me anyway) whether running the queries you mentioned in your text file will succeed against your database. Can you try them and let us know?

swolicki commented 1 year ago

If you need to enter your data using N'' literals, you must enable NCHAR literal replacement functionality in the underlying Oracle Database Client by setting the environment variable ORA_NCHAR_LITERAL_REPLACE to TRUE. Otherwise, the N'' literals, like the whole SQL statement, are processed in the database character set, losing any characters that the database character set does not support. The NCHAR literal replacement feature encodes all non-ASCII characters from N'' literals as Unicode escapes (\xxxx) in undocumented U'' literals, which are equivalent to UNISTR() function calls. Being encoded as ASCII backslash and ASCII hex digits, the characters are not corrupted when converted to the database character set, and the SQL engine can decode the original data from them.

AndreaPuccia commented 1 year ago

I ran queries mentioned above using the DBeaver Community Edition client without any issues, the queries behave fine. Is there a way to set the ORA_NCHAR_LITERAL_REPLACE variable like @swolicki suggests?

swolicki commented 1 year ago

If your client is Linux, then before starting your Python application, run:

if in sh: ORA_NCHAR_LITERAL_REPLACE=TRUE; export ORA_NCHAR_LITERAL_REPLACE if in csh: setenv ORA_NCHAR_LITERAL_REPLACE TRUE

If the application is run from a script, you can add the above statement into the script. The variable should be inherited by the Oracle Client library used by the Python program.

swolicki commented 1 year ago

Also note that JDBC does NCHAR literal replacement by default. Hence, any program that uses JDBC, like DBeaver or SQL Developer, should work fine out of the box, unless it explicitly disables the replacement.

swolicki commented 1 year ago

It is a possible enhancement to the Oracle's Python drivers to allow the feature to be enabled from inside the application with a connection property.

stale[bot] commented 1 year ago

This issue has been automatically marked as inactive because it has not been updated recently. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically closed because it has not been updated for a month.

oracle / python-cx_Oracle

Can't insert/select NVARCHAR2 column with russian and chinese strings #644

DB

PYTHON