Hey there. I'm currently experimenting with using vaex for processing large datasets in Python. I encountered an unexpected behavior when applying a custom function using vaex.apply. Specifically, while printing the result within the function yields the correct output, the returned value seems to be incorrect. Here's a simplified version of my code:

import numpy as np import pandas as pd import vaex from scipy.stats import gamma

Creating a DataFrame

d = {'A':[i for i in range(1000000)]} df = pd.DataFrame(data=d) a, b = 0.09717545806463647, 407034.13749400195

Setting up random seed

np.random.seed(1234)

Defining a custom function

def my_func(A): f = np.random.poisson(lam=100) sim = np.random.uniform(low=0, high=1, size=f) lossx1 = np.sum(gamma.ppf(sim, a, scale=b)) print(lossx1) # Printing the loss value for debugging return np.array(lossx1)

Converting DataFrame to vaex DataFrame

df_vaex = vaex.from_pandas(df)

Applying the function using vaex

df_result = df_vaex.apply(my_func, arguments=[df_vaex["A"]], vectorize=True, multiprocessing=False).values

Software information

Vaex version (import vaex; vaex.__version__):
Vaex was installed via: pip install vaex==4.16.0
OS: mac pro m1

vaexio / vaex

Unexpected Discrepancy Between Printed Values and Returned Results in vaex.apply() Function #2420

Creating a DataFrame

Setting up random seed

Defining a custom function

Converting DataFrame to vaex DataFrame

Applying the function using vaex