oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
337 stars 67 forks source link

Can't decode dbobject fields from database with CL8MSWIN1251 character set #371

Open golubovai opened 3 months ago

golubovai commented 3 months ago
  1. What versions are you using? Oracle Database 19c (NLS_CHARACTERSET = CL8MSWIN1251) platform.platform: Windows-11-10.0.22631-SP0 sys.maxsize > 2**32: True platform.python_version: 3.12.4 oracledb.version: 2.3.0b1 (commit: c5c6b4f21443b599a458afff933bc4f54734d68f)
  1. Is it an error or a hang or a crash? It's error

  2. What error(s) or behavior you are seeing? File "src\oracledb\impl/thin/dbobject.pyx", line 489, in oracledb.thin_impl.ThinDbObjectImpl.get_attr_value File "src\oracledb\impl/thin/dbobject.pyx", line 192, in oracledb.thin_impl.ThinDbObjectImpl._ensure_unpacked File "src\oracledb\impl/thin/dbobject.pyx", line 308, in oracledb.thin_impl.ThinDbObjectImpl._unpack_data File "src\oracledb\impl/thin/dbobject.pyx", line 346, in oracledb.thin_impl.ThinDbObjectImpl._unpack_data_from_buf File "src\oracledb\impl/thin/dbobject.pyx", line 377, in oracledb.thin_impl.ThinDbObjectImpl._unpack_value File "src\oracledb\impl/base/buffer.pyx", line 746, in oracledb.base_impl.Buffer.read_str UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdf in position 0: unexpected end of data

  1. Does your application call init_oracle_client()? No, Thin
golubovai commented 3 months ago

xmltype has same error:

import oracledb

def main():
    oracledb.defaults.config_dir = ""
    with oracledb.connect(user='', password='', dsn='', mode=oracledb.AUTH_MODE_SYSDBA) as c:
        with c.cursor() as cur:
            sql = "select xmltype('<a>Я</a>') from dual"
            for v in cur.execute(sql):
                print(v[0])

if __name__ == "__main__":
    main()
anthony-tuininga commented 3 months ago

Thanks, can you share the packets containing the cursor execution for one of these two scenarios? That might be helpful. I can compare with the case when the database character set is AL32UTF8 and see what is going on. I'll try to get a database set up with that character set to see if I can replicate.

golubovai commented 3 months ago

packets.txt

anthony-tuininga commented 3 months ago

Thanks, that was helpful. I can see that in your output the string inside the object is encoded in windows-1251 (0xDF) while in my output the object is encoded in utf-8 (0xD0 0xAF). It looks like the conversion is not occurring in the server -- which suggests that this is a database bug. I'll ask internally and get back to you.

golubovai commented 3 months ago

Ok, thank you, I'll wait for the problem to be resolved, because we are using such databases with object types.