has2k1 / plydata

A grammar for data manipulation in Python
https://plydata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
276 stars 11 forks source link

query function is not able to access variables. (UndefinedVariableError: name 'y' is not defined) #5

Closed neo-anderson closed 6 years ago

neo-anderson commented 6 years ago

Hi !

Not sure if this is a technical limitation, but query function is not able to access the values of variables. define function does not have this problem.

This works.

df = pd.DataFrame({'x': [1,2,3]})
df >> query('x > 2')

This doesn't.

y = 2
df = pd.DataFrame({'x': [1,2,3]})
df >> query('x > y')

For now, I'm doing this instead -

df >> query('x > {}'.format(y))

In the case of define, it works as expected.

y = 2
df = pd.DataFrame({'x': [1,2,3]})
df >> define(sum = 'x + y')
has2k1 commented 6 years ago

query is just pandas.DataFrame.query. I think changing it would come as a surprise to pandas users. Plus can make use of numexpr if available. So you have to add a prefix @ to the variables.

df >> query('x > @y')
neo-anderson commented 6 years ago

This worked perfectly. Thanks!