pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.27k stars 17.8k forks source link

BUG: #53028

Open SkeptiCali opened 1 year ago

SkeptiCali commented 1 year ago

Pandas version checks

Reproducible Example

import pandas as pd
import pymysql
import openpyxl
from openpyxl import Workbook, load_workbook
from openpyxl.styles import Font, PatternFill
from openpyxl.utils.dataframe import dataframe_to_rows

import imports_local.mysql_connector_ygrene as mc

conn = pymysql.connect(
        host='localhost',
        user='user1', 
        password = "password1",
        db='cali',
        cursorclass=pymysql.cursors.DictCursor
        )

df=pd.read_sql_query('SELECT * FROM sample_data LIMIT 2',conn)

print(df)

Issue Description

When using DictCursor the data returned from read_sql_query is just a dataframe of column names. If you change the cursorclass to cursor all is well. I also validated that the issues was not there in 1.5.3. The issue was first noticed in v. 2.0.0 and is there in 2.0.1

Example data output:

id value state_id county_id district_id zipcode plus4 0 id value state_id county_id district_id zipcode plus4 1 id value state_id county_id district_id zipcode plus4

Expected Behavior

id   value  state_id  county_id  district_id zipcode plus4

0 117 ABCDEF 5 12 12 95823 6312 1 163 DEFRDC 5 12 12 95814 4713

Installed Versions

INSTALLED VERSIONS ------------------ commit : 37ea63d540fd27274cad6585082c91b1283f963d python : 3.11.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.0.1 numpy : 1.24.2 pytz : 2023.3 dateutil : 2.8.2 setuptools : 65.5.0 pip : 23.1.2 Cython : None pytest : None hypothesis : None ...
asishm commented 1 year ago

Can you try with a sqlalchemy connectable as mentioned in the docs? Using a DBAPI2 connection for anything other than sqlite is not supported untested per the docs. Your code should also throw a UserWarning

SkeptiCali commented 1 year ago

I did and it worked, and it was against SQL Server. If memory serves me correct it didn't have an equivalent dictionary cursor though.

I believe I got that error message with 2.0.1 and wanted to be sure that the error was due to a deprecation of functionality.

The fix is to not use the dictionary cursor, and to be honest I don't know why the original author of the offending script used it and not just a regular cursor.

Based on what you said I'm good with your answer. Wanted to be sure I wasn't missing something.

Thanks for the very prompt reply It is greatly appreciated.

Matthew

Matthew


From: Asish Mahapatra @.> Sent: Monday, May 1, 2023 6:40:23 PM To: pandas-dev/pandas @.> Cc: SkeptiCali @.>; Author @.> Subject: Re: [pandas-dev/pandas] BUG: (Issue #53028)

Can you try with a sqlalchemy connectable as mentioned in the docs? Using a DBAPI2 connection for anything other than sqlite is not supported per the docs. Your code should also throw a UserWarning

— Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/53028#issuecomment-1530754691, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALXO6Z7AU7GDQF3GQ5MPWBDXEBQYPANCNFSM6AAAAAAXSI7QKE. You are receiving this because you authored the thread.Message ID: @.***>

asishm commented 1 year ago

So this reproduces with a sqlite3 conn as well (which doesn't have a dictcursor built in - so using https://stackoverflow.com/questions/3300464/how-can-i-get-dict-from-sqlite-query)

reproducer:

import sqlite3
conn = sqlite3.connect(':memory:')

df = pd.DataFrame({'a': [1,2], 'b': [2,3]})
df.to_sql('foo', conn, index=False)

def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

conn.row_factory = dict_factory

out = pd.read_sql('select * from foo', conn)

output:

In [27]: out
Out[27]:
   a  b
0  a  b
1  a  b

Issue stems from #50048 which tries to call

In [32]: lib.to_object_array_tuples([{'a': 1, 'b': 2}, {'a': 2, 'b': 3}])
Out[32]:
array([['a', 'b'],
       ['a', 'b']], dtype=object)
SkeptiCali commented 1 year ago

So based on your results should I just consider this deprecated behavior or a defect that needs to be fixed? Regardless, I have "work-arounds" so either way I am good.

asishm-wk commented 1 year ago

This is a regression