Closed JasonCrowe closed 11 years ago
to make sure that you don't have one row that is longer or shorter than the rest try the following (git master (v0.11.1 soon-to-be-released) assigns None
or NaN
depending on the type when this happens, I can't remember if that was different in 0.11.0)
import numpy as np
np.all(np.array(map(len, rows)) == len(rows[0]))
what is the output?
if you get True
, then this might be a bug that has been fixed: I can't replicate with what I think your data look like. I'm assuming a list of tuples of strings or a list of lists of strings. Is that correct?
The output was true. I am returning a record set from SQL Server that is made up of two string fields.
I have downgraded to 0.10.1 and the problem goes away.
hmm ok i will take a look
@JasonCrowe Okay, here's what i'm doing to try and reproduce this, being that you probably cannot give me the data. Let me know if the sql that creates your tables is different, e.g., did I use the correct types?
import sqlite3 as sql
from pandas.util.testing import rands # random strings
nameq = 'SELECT cust_num, name from custaddr'
name_create = 'CREATE TABLE custaddr (cust_num text, name text)'
conn = sql.connect(':memory:')
cursor = conn.cursor()
cursor.execute(name_create)
conn.commit()
strlen = 10
numrecs = 1000
data = [(rands(strlen), rands(strlen)) for _ in xrange(numrecs)]
cursor.executemany('INSERT INTO custaddr VALUES (?,?)', data)
conn.commit()
result = conn.execute(nameq)
rows = result.fetchall()
cols = 'cust_num', 'name'
df = DataFrame(rows, columns=cols)
This works as expected on v0.10.1
.
@cpcloud of course may be sql server vs other flavors issue
oh ok. i have never used sql (beyond examples) before...so i'm basically a n00b when it comes to pitfalls...should i keep going?
That looks right as far as types.
Here is a clip of the actual data: C001505, @@STOKES GENERAL STORE C002815, 3R DISTRIBUTING LLC C000032, 4 C'S EXPERT SHARPENING CO C000033, 422 SHOES C000034, 5 L ENTERPRISES
@cpcloud fyi....try pd.read_sql(....)
as well, this does some type conversions
Using the following code it works fine in 0.11.0
name_q = "SELECT cust_num, name from custaddr " name_df = sql.read_frame(name_q, CONN)
@JasonCrowe is rows (when it comes out of fetchall) list-like? an actual list? what if you do
DataFrame(list(rows),columns=COLUMNS)
?
was just about to ask that...
If I limit the query to the top 10 and print rows I get:
[(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')]
I will try DataFrame(list(rows),columns=COLUMNS)
Result:
ValueError: Shape of passed values is (1, 10), indices imply (2, 10)
this is with exactly the same data u show above?
This works, must be something else
In [2]: l
Out[2]:
[(u'C000001', u'WAL-MART CORPORATE'),
(u'C000002', u'CANADIAN TIRE CORPORATE'),
(u'C000003', u'WAL-MART NON ASN'),
(u'C000004', u'CANADIAN TIRE'),
(u'C000005', u'HOME DEPOT'),
(u'C000006', u'MEIJER'),
(u'C000007', u'SAMS'),
(u'C000008', u'TARGET'),
(u'C000009', u'WAL-MART'),
(u'C000010', u'WAL-MART CANADA')]
In [3]: DataFrame(l,columns=list('AB'))
Out[3]:
A B
0 C000001 WAL-MART CORPORATE
1 C000002 CANADIAN TIRE CORPORATE
2 C000003 WAL-MART NON ASN
3 C000004 CANADIAN TIRE
4 C000005 HOME DEPOT
5 C000006 MEIJER
6 C000007 SAMS
7 C000008 TARGET
8 C000009 WAL-MART
9 C000010 WAL-MART CANADA
yep that's what i get 2
Yes, they both use the same query:
SELECT top 10 cust_num, name from custaddr WHERE name is not null and cust_seq = 0 order by cust_num
what is cust_seq
? column of ints?
Yes, but the result is the same with or without the where sql.
you should programmatically check the type of each tuple element of the list and then each element of that tuple (even though looks like they are all unicode
objects) to make sure that there's no weirdness in there. fyi using print
can hide things sometimes, e.g., print []
and print str([])
look the same in the console but they are different objects...
@JasonCrowe u can do what @jreback just showed correct? let's get a baseline here...
DataFrame(l,columns=list('AB')) shows the same results as he posted.
ok that's good. rows[:10] == l
should return True
...does it?
maybe wrap it in sorted
as needed...
try this:
while(True):
r = rows[0:100]
try:
DataFrame(r,columns=list('AB'))
except:
print r
rows = rows[100:]
see where it breaks
Returns False
Will try while loop next.
at least one element of the list, then, is not a tuple
...
yep....weird
try type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)])
to get the type of the first non tuple
in rows
.
while:
[(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')] [] [] [] [] [] []
In [14]:type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)])
Out[14]:pyodbc.Row
there you go
i knew someone had put a sneaky repr
in somewhere...
So.... Is this a bug in pyodbc?
read_sql handles this correctly
@cpcloud maybe we should make a little function in utils somewhere to 'check' input and report where it is bad (not part of the construction code) but can be called manually
don't think so, but i might argue that they shouldn't repr
to a Python builtin so u need to convert those objects to tuple
s or list
s. a quick glance of the Row
docs for pyodbc
suggests that Row
implements the collections.Sequence
protocol (__getitem__
, __setitem__
, __len__
, and __contains__
, IIRC) which means you can probably do map(tuple, rows)
until we do what @jreback suggests
@JasonCrowe fundamentally the issue is that pandas doesn't coerce list like objects eg you could actually store a pyodbc row object in a frame (of course the utility is limited in that case)
@cpcloud maybe we could deal with coercing collections.Sequence though
@jreback hm why didn't list(rows)
work? that's essentially what _safe_fetch
is doing...
it's not recursively deep just a top level conversipn
oh i c, it's from_records
that makes it work
no that's not it...nvm
wait! it looks like it is: _list_to_arrays
is doing it i believe...
@cpcloud a bit deeper...I know where it is....see the refernces issue
ok. cool. i'm having a helluva time trying to connect to a database to test this...is there no way to do this with pyodbc?
closing this....enhancement issue in #3783
Code
name_q = "SELECT cust_num, name from custaddr" result = CONNECTION.execute(name_q) rows = result.fetchall()
COLUMNS = ['cust_num', 'name'] name_df = DataFrame(rows, columns=COLUMNS)
Code end
This code worked without fail on 0.10, but started failing as soon as I updated to 0.11.0. It gives me the following error.
ValueError Traceback (most recent call last)