Closed shaik-jafar6 closed 9 months ago
Check https://python-oracledb.readthedocs.io/en/latest/user_guide/sql_execution.html#querying-corrupt-data
Thank you for this. The issue still persists after adding the output handler because below is my data sample. Chinese characters are retrieving but there are some special characters in the name
FIRST_NAME 佳濱 �AIRUI �
I ran few queries in db below are the values NLS_CHARACTERSET = AL32UTF8 NLS_NCHAR_CHARACTERSET = AL16UTF16
The problem is that the Chinese characters in your database are not stored in the UTF-8 encoding (in Oracle parlance that is AL32UTF8). How did they get inserted? If you use python-oracledb to insert the data it will then be able to retrieved properly, too. Try setting the environment variable NLS_LANG
as follows:
set NLS_LANG=.AL32UTF8
Then run SQL*Plus and run that query. You should see similar "bad" results. Whoever is inserting the data is not inserting them correctly. You will need to find out what encoding they are actually using and then decode the data that way. You can use this approach to bypass the decode and do it yourself. Once the corrupted data is fixed and the source of the corrupt data is also fixed you can fetch normally again.
@anthony-tuininga The issue is not with chinese character i guess. It's due to some special character. To set the context I am querying data from oracle using script and writing it into a s3 bucket. Please guide me how to handle it from the code side to get as it is from database and dump into s3 as a file. As you can see in the below image i am able to read the data in the first line without any issues and the issue is with the row 2 and 3 which has square box character.
You need to find out first how the data is being inserted into the database. If the environment variable NLS_LANG
is set to .AL32UTF8
the database doesn't check the data that is beng inserted but accepts it as is. So if the client isn't actually encoding in UTF-8 but in some other encoding you will see the issues you are seeing. The solution is to find out what that encoding is so you can use the bypass_encoding
parameter and then decode it yourself using the encoding it was actually encoded with! You can also fix the corrupted data in the database (once you fix the source of the corruption).
The data is being inserted into oracle from SAP application. I think the encoding set at database level is .AL32UTF8. so did tried the below code suggested by you but it still raises the same error.
`os.environ["NLS_LANG"] = ".AL32UTF8" oracledb.init_oracle_client() conn = oracledb.connect(user="*", password="", dsn="****") cursor = conn.cursor()
def return_strings_as_bytes(cursor, name, default_type, size, precision, scale): if default_type == oracledb.DB_TYPE_VARCHAR: return cursor.var(str, arraysize=cursor.arraysize, bypass_decode=True)
with conn.cursor() as cursor: cursor.outputtypehandler = return_strings_as_bytes cursor.execute("SELECT first_name,last_name FROM ***.NAMAD ORDER BY FIRST_NAME DESC FETCH FIRST 3 ROWS ONLY") data = cursor.fetchall() print(data) print(data[0][1].decode("UTF-16"))`
Print data is the below one.
[(b'\xe4\xbd\xb3\xe6\xbf\xb1', b'\xe6\xb1\xa4'), (b'\xbfAIRUI', b'SONG'), (b'\xbf', b'CUI')]
Setting the value of NLS_LANG
with python-oracledb will not change anything. I was talking of doing so before running SQL*Plus (which will pay attention to the character set in that environment variable). That will show the same issues you are seeing in python-oracledb.
The data that is initially returned is the raw data stored in the database. You need to find out what the SAP application is doing. What encoding is it using when storing the data in the database? That is the encoding you will need to use to decode. Clearly it is not UTF-16! I'm not familiar enough with Chinese characters to be able to provide a reasonable guess as to the encoding.
Closing - no further information provided
I trying to query the table from oracle database. My table contains some chines language and german language characters in the data. when I am trying to read the data using python-oracledb i am getting below error.
Error: File "C:\Users\J\PycharmProjects\pythonProject\Static_Data.py", line 57, in
data = cursor.fetchall()
^^^^^^^^^^^^^^^^^
File "C:\Users\J\AppData\Roaming\Python\Python311\site-packages\oracledb\cursor.py", line 457, in fetchall
row = fetch_next_row(self)
^^^^^^^^^^^^^^^^^^^^
File "src\oracledb\impl/base/cursor.pyx", line 399, in oracledb.base_impl.BaseCursorImpl.fetch_next_row
File "src\oracledb\impl/base/cursor.pyx", line 210, in oracledb.base_impl.BaseCursorImpl._create_row
File "src\oracledb\impl/thick/var.pyx", line 134, in oracledb.thick_impl.ThickVarImpl._get_scalar_value
File "src\oracledb\impl/thick/var.pyx", line 272, in oracledb.thick_impl.ThickVarImpl._transform_element_to_python
File "src\oracledb\impl/thick/utils.pyx", line 284, in oracledb.thick_impl._convert_to_python
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte
Code: import oracledb oracledb.init_oracle_client() conn = oracledb.connect(user="####", password="####", dsn="####") cursor = conn.cursor() cursor.execute("SELECT * FROM Table")
data = cursor.fetchall()
data = [row.decode('utf-8'') for row in data] df = pd.DataFrame(data) print(df.head())
cursor.close() conn.close()