vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.31k stars 591 forks source link

Cannot retrieve the groups on datetime column after binning #1303

Open svittapu opened 3 years ago

svittapu commented 3 years ago

The following code gives an error in versions 4.0.x and 4.1.x.

import vaex
import pandas as pd
import numpy as np

dt_array = pd.bdate_range(start='2015-01-01', end='2015-05-25', freq='5min')
dt = np.array(dt_array)
x = np.random.normal(size=len(dt))
df = vaex.from_arrays(dt=dt, x=x)

dbg = df.groupby(by=vaex.BinnerTime(df.dt, resolution='M'))
print(dbg)
for group, dff in dbg:
    print(group, dff)
Error:
  File "/home/svittapu/dev/vaex/packages/vaex-core/vaex/dataframe.py", line 2606, in _evaluate_selection_mask
    mask = scope.evaluate(name)
  File "/home/svittapu/dev/vaex/packages/vaex-core/vaex/scopes.py", line 181, in evaluate
    result = eval(expression, expression_namespace, self)
  File "<string>", line 1
    (dt == 2015-01-01)
                 ^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
maartenbreddels commented 3 years ago

Thanks for the report.

@JovanVeljanoski I wonder if this is because of a date comparison issue, or that the dff does not report the column as date type.

JovanVeljanoski commented 2 years ago

Indeed the issue comes because somewhere in the process the group variable in the example above has turned into a python datetime.date type, which causes comparison problems..