vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.23k stars 590 forks source link

[BUG-REPORT] materialize do not support column name length greater than 1 #2247

Closed cgjosephlee closed 1 year ago

cgjosephlee commented 1 year ago

Description Good

x = np.arange(1,4)
y = np.arange(2,5)
df = vx.from_arrays(x=x, y=y)
df['r'] = (df.x**2 + df.y**2)**0.5 # 'r' is a virtual column (computed on the fly)
df = df.materialize('r')  # now 'r' is a 'real' column (i.e. a numpy array)

Fail

x = np.arange(1,4)
y = np.arange(2,5)
df = vx.from_arrays(x=x, y=y)
df['pqr'] = (df.x**2 + df.y**2)**0.5 # 'r' is a virtual column (computed on the fly)
df = df.materialize('pqr')  # now 'r' is a 'real' column (i.e. a numpy array)

# NameError: p is not a column or virtual column

Software information

Additional information Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).

JovanVeljanoski commented 1 year ago

Thanks for the report. I thought we improved this but maybe I misremember. Anyway I just open a PR with the fix. As a workaround, just pass a list of columns/expressions in materialize. So your example will be: df = df.materialize(['pqr']) and this should work just fine.

cgjosephlee commented 1 year ago

Thanks. Put into a list do solve the problem for now.