Closed iamdbychkov closed 3 years ago
The only way that Oracle Database will return replacements for characters is when the database character set contains characters that cannot be represented in the client character set. Since your client character set is UTF-8, that will never happen because UTF-8 is capable of representing all characters (it is a universal character set).
What is your database character set? You can find out by running the query mentioned in the documentation. Note that if the client and database indicate they are using the same character set that the Oracle Database doesn't perform any checking of the data that is put into the database. This can allow invalid data to be stored in the database.
You can use the outputtypehandler
option with cursor.callproc()
and cursor.callfunc()
calls that return a REF cursor. You can do this:
cursor = conn.cursor()
ref_cursor = conn.cursor()
ref_cursor.outputtypehandler = output_type_handler
cursor.callproc("some_procedure_with_ref_cursors", [ref_cursor])
rows = ref_cursor.fetchall()
You can also do this:
conn.outputtypehandler = output_type_handler
ref_cursor = conn.cursor()
cursor.callproc("some_procedure_with_ref_cursors", [ref_cursor])
rows = ref_cursor.fetchall()
If you put the output type handler on the connection, all cursors that were created by that connection will use it. Since you are returning a cursor, this is going to be the only approach you can use -- unless you use a variable and define it as the return type (more complex).
When building the variable you can use the encoding_errors
option to tell Python what to do when it encounters corrupt data. See the documentation for an example.
Finally, if you use cx_Oracle 8.2 instead of cx_Oracle 7.3 you have another option available to you. When building the variable you can use the new parameter bypass_decode
which will tell cx_Oracle to not bother decoding the text but return it directly to you as bytes -- the same as with Python 2.7. You can see an example in the documentation.
Thank you very much, Anthony!
Database and client's charsets are AL32UTF8.
Setting outputtypehandler
on connection object indeed worked.
bypass_decode
is also an interesting option which I'm yet to try.
Besided solving some random person problem you also provided useful information about client-database encoding behaviour. And in general I can see you're a very active participant in responding to people's questions and resolving their issues. Role model. Thank you once again.
You're very welcome and I'm happy I could be of help!
Currently in a process of migration from python2.7 to python3.6 and cx_Oracle 7.3.0.
Today, while testing, I stumbled upon an UnicodeDecodeError while fetching data:
To fetch data from database I have to use
cursor.callfunc
method, which return type is cx_Oracle.CURSOR, and.fetchmany()
to load in batches.In my research I've found out that there are some invalid UTF-8 string in database.
The source of invalid strings is another application, which loads data from parsed files, which, in turn, may contain anything.
Since my application main purpose is to send fetched data over interner i've never been bothered by decoding bytes while app was running using python2.7.
But now it became a problem.
I've wanted to try to use
outputtypehandler
for VARCHAR column, but the return values ofcallfunc
are not affected by this. I've also seen this article in documentation: https://cx-oracle.readthedocs.io/en/7.3/user_guide/globalization.html#character-set-example which says:But for some reason, I've got no replacements, my guess it's also not an option for
cursor.callfunc
?Client and database encoding is AMERICAN_AMERICA.AL32UTF8
How one can solve such issue? Is there a way to solve an issue on the client side, without enforcement of utf8 encoding for every string which enters database? (fixing all clients which can write to database)