Open kay1793 opened 10 years ago
Just to make sure you're aware:
In [3]: df.query('cat > 0 & count > 0')
Out[3]:
cat count
1 1 20
And for comparison:
In [15]: df[df.cat]
Out[15]:
cat count
0 0 10
1 1 20
which gives the same as df.query('cat')
I think there have been previous issues about how to handle these cases (though not with respect to query specially)
Thanks Tom, yeh I got the "fix".
The truthiness might be tricky like you said too, but you got lucky I think it's not doing what you think:
In [8]: df=pd.DataFrame([[1,10],[0,20]],columns=['cat','count'])
...: df[df.cat]
Out[8]:
count cat
0 10 1
1 20 0
In [9]: df.query('cat')
Out[9]:
cat count
1 0 20
0 1 10
or similar
In [12]: df=pd.DataFrame([[0,10],[1,20],[2,np.nan]],columns=['cat','count'])
...: df[df.cat]
IndexError: indices are out-of-bounds
In [13]: df.query('cat')
Out[13]:
cat count
0 0 10
1 1 20
2 2 NaN
Anyway, was matching df[expr]
and df.query(expr)
an explicit goal or promise? not sure. doesn't look like it.
The AST exception should not happen I'm more convinced.
@kay1793 the error handling could be improved here. pull-request?
The docstring says query
expects a boolean expression and It's doesn't complain or document what happens when the result of the expression is not a boolean and it doesn't coerce the result into a boolean array.
Sorry jreback, I have too much on my hands currently to dive in the query
code.
@kay1793
@cpcloud and I discussed this yesterday, your exprssion should raise as it doesn't result in a boolean. So we'll fix this.
tx @jreback ! Also found #8568, sorry I can't help with the fixes now.
I just tripped over this. Glad this thread came up in Google :)
@jreback are you still looking for someone to help with this?
@JonHannah sure! any open issues that are interesting need help :>
Great - I'll try and take a look sometime in the next week
@JonHannah do you want to have a look at this?
Sorry - I don't have time at the moment 😞
Started looking at this issue, and it looks like df.query()'s behavior has changed since this issue was created. Is this the intended behavior now?
In [1]: import pandas as pd
...: print(pd.__version__)
...: df=pd.DataFrame([[0,10],[1,20]],columns=['cat','count'])
...: display(df)
...:
...:
1.1.0.dev0+765.g7fa8ee728
cat count
0 0 10
1 1 20
In [2]: display(df.query('cat & count > 10'))
cat count
1 1 20
In [3]: display(df.query('cat > 0 & count > 10'))
cat count
1 1 20```
@a-y-khan I don't believe the behavior is currently correct. Your example does not touch upon the issue here; if you change the query to count >= 10
, I think you would see two rows whereas the correct behavior is to only include one.
Update: Even changing the query to count >= 10
surprisingly only returns a single row. In any case, I misread the issue; the agreed upon correct behavior here is to raise since cat is not boolean.
take
Expected the first row where cat==0 to be dropped since 0 is Falsey.
This unhelpful exception is how I stumbled over this