Hey there.
I'm currently experimenting with using vaex for processing large datasets in Python. I encountered an unexpected behavior when applying a custom function using vaex.apply. Specifically, while printing the result within the function yields the correct output, the returned value seems to be incorrect. Here's a simplified version of my code:
import numpy as np
import pandas as pd
import vaex
from scipy.stats import gamma
Creating a DataFrame
d = {'A':[i for i in range(1000000)]}
df = pd.DataFrame(data=d)
a, b = 0.09717545806463647, 407034.13749400195
Setting up random seed
np.random.seed(1234)
Defining a custom function
def my_func(A):
f = np.random.poisson(lam=100)
sim = np.random.uniform(low=0, high=1, size=f)
lossx1 = np.sum(gamma.ppf(sim, a, scale=b))
print(lossx1) # Printing the loss value for debugging
return np.array(lossx1)
Hey there. I'm currently experimenting with using vaex for processing large datasets in Python. I encountered an unexpected behavior when applying a custom function using vaex.apply. Specifically, while printing the result within the function yields the correct output, the returned value seems to be incorrect. Here's a simplified version of my code:
import numpy as np import pandas as pd import vaex from scipy.stats import gamma
Creating a DataFrame
d = {'A':[i for i in range(1000000)]} df = pd.DataFrame(data=d) a, b = 0.09717545806463647, 407034.13749400195
Setting up random seed
np.random.seed(1234)
Defining a custom function
def my_func(A): f = np.random.poisson(lam=100) sim = np.random.uniform(low=0, high=1, size=f) lossx1 = np.sum(gamma.ppf(sim, a, scale=b)) print(lossx1) # Printing the loss value for debugging return np.array(lossx1)
Converting DataFrame to vaex DataFrame
df_vaex = vaex.from_pandas(df)
Applying the function using vaex
df_result = df_vaex.apply(my_func, arguments=[df_vaex["A"]], vectorize=True, multiprocessing=False).values
Software information
import vaex; vaex.__version__)
: