pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.22k stars 17.78k forks source link

MySQLdb driver returns tuple instead of list and Pandas isn't happy about it #1819

Closed dandye closed 12 years ago

dandye commented 12 years ago

The MySQLdb driver returns rows as tuples instead of lists and Pandas isn't happy about it:

sql.read_frame("select token,status FROM purchases WHERE purchases.created_at > '2012-08-26 00:00:00' ", conn) Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\pandas\io\sql.py", line 140, in read_frame coerce_float=coerce_float) File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 714, in from_records coerce_float=coerce_float) File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 4078, in _to_sdict return _list_to_sdict(data, columns, coerce_float=coerce_float) File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 4091, in _list_to_sdict content = list(lib.to_object_array_tuples(data).T) File "inference.pyx", line 511, in pandas._tseries.to_object_array_tuples (pandas\src\tseries.c:79688) TypeError: Argument 'rows' has incorrect type (expected list, got tuple)

I confirmed that Listifying the rows fixes the issue:

cursor, conn = connex.get_connex()
cursor.execute("""
    SELECT token,status
    FROM purchases
    WHERE purchases.created_at > '2012-08-26 00:00:00' """)

rows = cursor.fetchall()
lrows = []
for row in rows:
    lrows.append(list(row))

colnames = tuple([desc[0] for desc in cursor.description])
df = DataFrame(lrows,columns=colnames)  # this works

Also, calling the DataFrame constructor with the Tuple rows raises an error that isn't very informative:

>>> df = DataFrame(rows,columns=colnames) Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 337, in init raise PandasError('DataFrame constructor not properly called!') pandas.core.common.PandasError: DataFrame constructor not properly called!

lodagro commented 12 years ago

Creating a DataFrame from a list of tuples should work fine. See example below. Could it be that rows is not a list holding tuples but maybe a tuple holding tuples? I am not an expert on db-api`s, i would expect rows to be a list, bit it does not seem to be the case based on the traceback you get.

In [14]: pandas.__version__
Out[14]: '0.7.3'  # my estimated guess on your pandas version, cfr. last traceback.

In [15]: df = pandas.DataFrame([(0, 'foo', 10.0), (1, 'bar')])

In [16]: df
Out[16]: 
   0    1   2
0  0  foo  10
1  1  bar NaN
dandye commented 12 years ago

Iodagro, You are correct: the problem isn't the list of tuples needing to be a list of lists but rather with a tuple of tuples:

>>> rows = cursor.fetchall()
>>> type(rows)
<type 'tuple'>
>>> pandas.__version__
'0.7.3'
lodagro commented 12 years ago

Had a look in the code and this has been fixed. In the latest release sql.read_frame() can handle db-api that returns tuple instead of a list for cursor.fetchall().