UnicodeDecodeError: 'utf-8' codec can't decode byte (outputtypehandler encoding_errors="replace" not being honored)

anthony-tuininga commented 5 months ago

Discussed in https://github.com/oracle/python-oracledb/discussions/272

^{Originally posted by **rh8056** December 20, 2023} Hello, I posted this question on StackExchange yesterday as well, but figured it would make sense to ask here as well. I'm struggling to deal with what I think is corrupt data stored in an Oracle database when reading it in a python script. I have the following: ``` def output_type_handler(cursor, metadata): if metadata.type_code is oracledb.DB_TYPE_VARCHAR: return cursor.var(metadata.type_code, arraysize=cursor.arraysize, encoding_errors="replace") mydb, mycursor = connectOracle() mycursor.outputtypehandler = output_type_handler mycursor.execute('''select file_id, md5_value, case when file_content is not null then utl_raw.cast_to_varchar2(dbms_lob.substr(file_content)) else '' end as FILE_CONTENT from archive_file where archive_id = 123 and file_name = 'file_name.txt' ''') row = mycursor.fetchone() print(row) ``` When I run this, I'm getting the following: ```UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 647: invalid continuation byte``` If I run this directly in SQL Developer, I get an output, with � in the file content. My SQL Developer settings are set to ```UTF-8```, and my Oracle database ```NLS_NCHAR_CHARACTERSET``` is set to ```AL32UTF8```. My understanding with the ```outputtypehandler``` change is that my python script would also output a � in this instance. I added some ```print()``` commands inside of the ```def output_type_handler()``` to verify that it is indeed being called, and I saw output, so it appears that it is, but it also seems like the ```encoding_errors="replace"``` is being ignored. What am I missing here? Thanks!

anthony-tuininga commented 5 months ago

I confirmed that this was indeed a regression from cx_Oracle and have added a test case to confirm that it works correctly now. If you are able to build from source you can verify that it works for you, too.

anthony-tuininga commented 5 months ago

The patch has been included in version 2.0.1 which was just released.

oracle / python-oracledb

UnicodeDecodeError: 'utf-8' codec can't decode byte (outputtypehandler encoding_errors="replace" not being honored) #279

Discussed in https://github.com/oracle/python-oracledb/discussions/272