vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.23k stars 590 forks source link

[BUG-REPORT] Got unexpected argument type 'numpy.datetime64'> for compute function when used scalar_datetime #2317

Open shriharimundada opened 1 year ago

shriharimundada commented 1 year ago

After upgrading to vaex 4.16.0 , I am unable to use scalar_timedelta in vaex.expression.Expression. The code snippet is as below

open a parquet file in vaex that has birthDate column.

df = vaex.open("path to parquet file")

df['birthDate'] = df['birthDate'].astype('datetime64') filter_expression_string = '(notna(birthDate))' then_expression_string = "(scalar_datetime('2022-11-01'))" else_expression_string = "(birthDate + scalar_timedelta(75, 'D'))" filter_expression = vaex.expression.Expression(df, "{0}".format(filter_expression_string)) then_expression = vaex.expression.Expression(df, "{0}".format(then_expression_string)) else_expression = vaex.expression.Expression(df, "{0}".format(else_expression_string)) df['testDate'] = df.func.where(filter_expression, then_expression, else_expression)

This used to work on 4.9.1. My use case is to set a column with constant date value

Vaex library details are as follows: {'vaex': '4.16.0', 'vaex-core': '4.16.1', 'vaex-viz': '0.5.4', 'vaex-hdf5': '0.14.1', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.3', 'vaex-jupyter': '0.8.1', 'vaex-ml': '0.18.1'}

open a parquet file in vaex that has birthDate column.

df = vaex.open("path to parquet file")

df['birthDate'] = df['birthDate'].astype('datetime64') filter_expression_string = '(notna(birthDate))' then_expression_string = "(scalar_datetime('2022-11-01'))" else_expression_string = "(birthDate + scalar_timedelta(75, 'D'))" filter_expression = vaex.expression.Expression(df, "{0}".format(filter_expression_string)) then_expression = vaex.expression.Expression(df, "{0}".format(then_expression_string)) else_expression = vaex.expression.Expression(df, "{0}".format(else_expression_string)) df['testDate'] = df.func.where(filter_expression, then_expression, else_expression)

Exception:

 return func.call(args, None, memory_pool)                   
                           File "pyarrow/_compute.pyx", line 335, in                     
                         pyarrow._compute.Function.call                                  
                           File "pyarrow/_compute.pyx", line 460, in                     
                         pyarrow._compute._pack_compute_args                             
                         TypeError: Got unexpected argument type <class                  
                         'numpy.datetime64'> for compute function                        

                         During handling of the above exception,                         
                         another exception occurred:                                     

                         Traceback (most recent call last):                              
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/scopes.py", line 113,                  
                         in evaluate                                                     
                             result = self[expression]                                   
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/scopes.py", line 198,                  
                         in __getitem__                                                  
                             raise KeyError("Unknown variables or                        
                         column: %r" % (variable,))                                      
                         KeyError: 'Unknown variables or column:                         
                         "where((notna(birthDate)),                                      
                         (scalar_datetime(\'2022-11-01\')), (birthDate                   
                         + scalar_timedelta(75, \'D\')))"'                               

                         During handling of the above exception,                         
                         another exception occurred:                                     

                         Traceback (most recent call last):                              
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/dataframe.py", line                    
                         4098, in table_part                                             
                             values[name] = df.evaluate(name)                            
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/dataframe.py", line                    
                         3095, in evaluate                                               
                             return                                                      
                         self._evaluate_implementation(expression,                       
                         i1=i1, i2=i2, out=out, selection=selection,                     
                         filtered=filtered, array_type=array_type,                       
                         parallel=parallel, chunk_size=chunk_size,                       
                         progress=progress)                                              
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/dataframe.py", line                    
                         6445, in _evaluate_implementation                               
                             dtypes[expression] = dtype =                                
                         df.data_type(expression).internal                               
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/dataframe.py", line                    
                         2275, in data_type                                              
                             data = self.evaluate(expression, 0, 1,                      
                         filtered=True, array_type=array_type,                           
                         parallel=False)                                                 
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/dataframe.py", line                    
                         3095, in evaluate                                               
                             return                                                      
                         self._evaluate_implementation(expression,                       
                         i1=i1, i2=i2, out=out, selection=selection,                     
                         filtered=filtered, array_type=array_type,                       
                         parallel=parallel, chunk_size=chunk_size,                       
                         progress=progress)                                              
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/dataframe.py", line                    
                         6562, in _evaluate_implementation                               
                             value = block_scope.evaluate(expression)                    
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/scopes.py", line 113,                  
                         in evaluate                                                     
                             result = self[expression]                                   
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/scopes.py", line 188,                  
                         in __getitem__                                                  
                             values = self.evaluate(expression)                          
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/scopes.py", line 119,                  
                         in evaluate                                                     
                             result = eval(expression,                                   
                         expression_namespace, self)                                     
                           File "<string>", line 1, in <module>                          
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/arrow/numpy_dispatch.                  
                         py", line 136, in wrapper                                       
                             result = f(*args, **kwargs)                                 
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/vaex/functions.py", line                    
                         2694, in where                                                  
                             return pa.compute.if_else(condition, x, y)                  
                         if dtype is None else                                           
                         pa.compute.if_else(condition, x,                                
                         y).cast(dtype)                                                  
                           File "/Users/shrihari/opt/anaconda3/lib/pyth                  
                         on3.8/site-packages/pyarrow/compute.py", line                   
                         233, in wrapper                                                 
                             return func.call(args, None, memory_pool)                   
                           File "pyarrow/_compute.pyx", line 335, in                     
                         pyarrow._compute.Function.call                                  
                           File "pyarrow/_compute.pyx", line 460, in                     
                         pyarrow._compute._pack_compute_args                             
                         TypeError: Got unexpected argument type <class                  
                         'numpy.datetime64'> for compute function