vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 589 forks source link

[BUG-REPORT] SyntaxError when representing result of a basic operation involving Expression and numpy array, with the array on the right of the operator #2405

Open athob opened 7 months ago

athob commented 7 months ago

Description Vaex breaks with a SyntaxError when representing the result of an operation between a vaex DataFrame Expression column and a numpy array of same length, with the Expression being on the left of the operator while the numpy array is on the right of the operator. The error doesn't happen when switching the order of the operands (array on the left and Expression on the right). I am providing a simplified example of this issue in the additional information section.

Software information

Additional information When generating the following elements:

import vaex as vx
import numpy as np
n_cols = 3
n_rows = 100
ds = vx.from_dict({f"a{i}": np.random.rand(n_rows) for i in range(n_cols)})

the following line generates a SyntaxError:

repr(ds.a0 * np.random.rand(n_rows))

while the following lines do not break:

a = ds.a0 * np.random.rand(n_rows)
repr(np.random.rand(n_rows) * ds.a0)

This happens equally with basic operators +, -, /, //, %, **, >, <, ==, !=, >=, <=, &, | and ^.

edited to correct wrong variable usage in code snippets

athob commented 7 months ago

To add on this, the following line:

np.random.rand(n_rows) * ds.a0

actually produces a numpy array made of n_rows Expression objects, with each i-th Expression corresponding to the i-th element of the random array multiplied by the ds.a0 Expression object. This doesn't feel very intuitive to me, but I may be mistaken about what to expect from this line of code.