pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.19k stars 17.77k forks source link

BUG: Lost precision of NUMBER in read_sql_table(). #34988

Open andrewkisel opened 4 years ago

andrewkisel commented 4 years ago

Problem description

When using pd.read_sql_table() with Oracle DB precision of data type sqlalchemy.dialects.oracle.NUMBER is lost.

Expected Output

Precision should not be lost as this datatype can be either integer or float.

The problem

Looks like the root cause for this is in _harmonize_columns(), particularly in _get_dtype() that is used there. The SQLAlchemy type sqlalchemy.dialects.oracle.NUMBER is based on both sqlalchemy.types.Numeric and sqlalchemy.types.Integer. As there is no support for SQLAlchemy Numeric type in _get_dtype() this Oracle type is being treated as integer and precision is dropped when using astype() in _harmonize_columns() itself.

kirthash commented 4 years ago

Hello

This bug has been reproduced in:

INSTALLED VERSIONS

commit : None python : 3.8.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : es_ES.cp1252

pandas : 1.0.4 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 41.2.0 jinja2 : 2.11.2 IPython : 7.15.0 sqlalchemy : 1.3.18

df3 have the correct dtype.

image

jowage58 commented 2 years ago

I also ran into this problem with an Oracle column defined as NUMBER(7,6). As mentioned since there's no handling of Numeric types in the method pandas.io.sql.SQLTable._get_dtype and since the sqlalchemy.dialects.oracle.base.NUMBER(sqltypes.Numeric, sqltypes.Integer) falls into the Integer test and it returns int64 for the type.

https://github.com/pandas-dev/pandas/blob/66e3805b8cabe977f40c05259cc3fcf7ead5687d/pandas/io/sql.py#L1241-L1255

As a work-around I patched the _get_dtype method and added a check before calling the original method.

if isinstance(sqltype, sqlalchemy.types.Numeric):
    return sqltype.python_type
JamieAllen1 commented 1 year ago

take