Closed Ydv-aakash closed 2 years ago
That is happening because for each pass in the loop, you are creating a new temp_df
dataframe, erasing the stuff you did in the previous pass in the loop.
I modified your code a bit to make it work:
import vaex # do not import vaex as vx please
import numpy as np
a = [1, 2, 3, np.nan, 5, 6]
b = ['a', 'b', 'b', None, None, 'u']
c = ['michael', 'dwight', 'jim', 'pam', None, 'stanley']
df = vaex.from_arrays(x=a, y=b, z=c)
temp_df = df.copy() # This is a shallow copy, no memory is used
for col in temp_df.get_column_names():
if temp_df[col].is_string():
temp_df = temp_df.fillna(value='dontknow', column_names=[col])
else:
mode = df[col].value_counts(dropna=True).index[0] # This will run out of core, but you need to check for ties yourself
temp_df = temp_df.fillna(value=mode, column_names=[col])
I hope this helps!
I am using the above code to replace missing/Nan values with mode and a string value for numerical and categorical features respectively. But, after running this code, only last column is modified.