Open dbrownems opened 1 month ago
Thanks for the report. This was likely introduced with #50048 when the nullable keyword (now renamed to dtype_backend) was added to sql functions.
There is a comment https://github.com/pandas-dev/pandas/pull/50048#discussion_r1184376583 which is where it assumes it's string data. cc @phofl
Full Traceback
Exception has occurred: UnicodeDecodeError
'utf-8' codec can't decode byte 0x89 in position 4: invalid start byte
File "/home/asishm/pandas-asishm/pandas/core/arrays/string_.py", line 412, in _from_sequence
result = lib.ensure_string_array(scalars, na_value=libmissing.NA, copy=copy)
File "/home/asishm/pandas-asishm/pandas/core/internals/construction.py", line 972, in convert
arr = arr_cls._from_sequence(arr, dtype=new_dtype)
File "/home/asishm/pandas-asishm/pandas/core/internals/construction.py", line 993, in <listcomp>
arrays = [convert(arr) for arr in content]
File "/home/asishm/pandas-asishm/pandas/core/internals/construction.py", line 993, in convert_object_array
arrays = [convert(arr) for arr in content]
File "/home/asishm/pandas-asishm/pandas/io/sql.py", line 161, in _convert_arrays_to_dataframe
arrays = convert_object_array(
File "/home/asishm/pandas-asishm/pandas/io/sql.py", line 198, in _wrap_result
frame = _convert_arrays_to_dataframe(data, columns, coerce_float, dtype_backend)
File "/home/asishm/pandas-asishm/pandas/io/sql.py", line 2738, in read_query
frame = _wrap_result(
File "/home/asishm/pandas-asishm/pandas/io/sql.py", line 691, in read_sql
return pandas_sql.read_query(
File "/home/asishm/pd-issues/59242.py", line 12, in <module>
df = pd.read_sql(query, db, dtype_backend='pyarrow')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 4: invalid start byte
take
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
This fails with
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 4: invalid start byte
The repro is for sqllite, but the issue is the same with sqlalchemy and pyodbc.
Also read_sql_table fails with the same error.
Expected Behavior
Should succeed and return a dataframe with a binary column. It works with the default backend.
Installed Versions