Closed AltafHussain4748 closed 2 years ago
Could you post an ODBC trace and more information about your environment?
Also being discussed on Stack Overflow here.
I am unable to reproduce your issue using ODBC Driver 17 for SQL Server:
df = pd.read_sql_query("SELECT CAST(1 AS int) AS foo", engine)
print(df.info())
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 foo 1 non-null int64
dtypes: int64(1)
memory usage: 136.0 bytes
None
"""
What ODBC driver are you using? Can you supply a minimal reproducible example?
@gordthompson
import pandas as pd li = [{'rowid': 976, 'INSTRPERMID': 8590928689}, {'rowid': 952, 'INSTRPERMID': None}] print(pd.DataFrame.from_dict(li))
Gives
rowid INSTRPERMID 0 976 8.590929e+09 1 952 NaN
You have yet to mention any details about your environment. What DBMS, OS, ODBC driver, Python version, etc.? If you don't supply that information we will be unable to assist.
@v-chojas
DBMS: MYSQL
OS: UBUNTU
ODBC DRIVER: PYODBC
Python: >3.7
See https://github.com/mkleehammer/pyodbc/wiki/Troubleshooting-%E2%80%93-Generating-an-ODBC-trace-log and attach an ODBC trace.
I am able to reproduce the issue, but it is not a problem with pyodbc per se; it is a pandas issue. I get the same result if I use mysqlclient, which is SQLAlchemy's preferred DBAPI layer for working with MySQL:
connection_uri = 'mysql+mysqldb://scott:tiger@192.168.0.199/test'
engine = sa.create_engine(connection_uri, echo=True)
sql = """\
SELECT 1 AS foo
UNION ALL
SELECT NULL AS foo
"""
df = pd.read_sql_query(sql, engine)
print(df)
"""
foo
0 1.0
1 NaN
"""
print(df.info())
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 foo 1 non-null float64
dtypes: float64(1)
memory usage: 144.0 bytes
None
"""
This is not a bug but normal behavior of pandas. Pandas reflects missing data as np.nan which itself is of dtype float64. This is why pandas casts series of integers that is missing values to float64 dtype (link).
I am trying to read a stored procedure by using pandas.read_sql method in Python by using the pyodbc driver. The issue I have is when Pandas reads data, it converts int columns to float which have Null values in it.