vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.27k stars 590 forks source link

[BUG-REPORT] AssertionError while performing math operation on shifted columns #2383

Open msat59 opened 1 year ago

msat59 commented 1 year ago

Description Let's say I want to calculate something on a column and its shifted values (Lags or Leads). The basic one can be df.A - 2*df.A_shifted. It can easily be done in pandas: df.A - df.A.shift(1). However, VAEX throws an exception saying AssertionError:. Below is the code I used:

df = pd.DataFrame(data={'A': [1,2,3],'B':[4,5,6]})
dfv = vaex.from_pandas(df)

Pandas:

print(df.A  - 2 * df.A.shift(1))

output:
0    NaN
1    0.0
2   -1.0
Name: A, dtype: float64

Vaex:

print((dfv.A - 2 * dfv.shift(1, 'A', fill_value=0).A).to_pandas_series())

output:
AssertionError:  
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[34], line 1
----> 1 (dfv.A - 2 * dfv.shift(1, 'A', fill_value=0).A).to_pandas_series()

File ~\miniconda3\envs\py38\lib\site-packages\vaex\expression.py:139, in Meta.__new__.<locals>.wrap.<locals>.f(a, b)
    137 else:
    138     if isinstance(b, Expression):
--> 139         assert b.ds == a.ds
    140         b = b.expression
    141     elif isinstance(b, (np.timedelta64)):

AssertionError: 

Initially, I thought vaex fails to do operation on nan values so I used fill_value=0 to make sure nothing fancy is required. Certainly something is wrong because I can do calc using both A and B columns.

print((dfv.A - dfv.B).to_pandas_series())

output:
0   -3
1   -3
2   -3
dtype: int64

Software information