vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 589 forks source link

[BUG-REPORT] Trying to filter a DataFrame on a column with spaces in its name raises NameError #1219

Open vmussa opened 3 years ago

vmussa commented 3 years ago

Description Similar to #1217 and #1218. Trying to filter a DataFrame on a column with spaces in its name raises the following error: NameError: name 'column_name' is not defined.

Steps to reproduce

import vaex as vx
df = vx.open('dataset.hdf5')

boolean_mask = df['column name'] > 0 # alternatively: boolean_mask = df.column_name > 0
df[boolean_mask] # error raising bit

Software information

Additional information I'm actually trying to filter this specific dataset, which came with spaced named columns, which I only saw after I had already converted the csv to hdf5. I will convert it again, but changing its columns' names first, removing the spaces. But what made me report the issue was this curious behavior: just producing the boolean mask works, while using it to filter the DataFrame produces the error, no matter which format you choose to name the column (['column name'] or .column_name).

foooooooooooooooobar commented 3 years ago

I see I'm not the only one https://github.com/vaexio/vaex/issues/1229

Nicholas-Schaub commented 2 years ago

It looks like this has to do with how vaex is using the ast library to parse expressions. It appears as though any time an expression is used, if one of the columns contains a space then it will throw an error. We are running into the same error.

maartenbreddels commented 2 years ago

Hi,

I cannot reproduce this, since this seems to work fine:

df = vaex.from_arrays(x=[1,2])
df['column name'] = df.x -1
df.export('spacetest.hdf5')
df = vaex.open('spacetest.hdf5')
boolean_mask = df['column name'] > 0 # alternatively: boolean_mask = df.column_name > 0
assert df[boolean_mask].sum('x') == 2

Please provide a reproducible example I can copy paste so I can fix this.

Regards,

Maarten