haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.97k stars 1.13k forks source link

What is the efficient way to fill null values in a column with an arbitrary string in a Dataframe? #766

Open jamalromero opened 2 months ago

jamalromero commented 2 months ago

As we know many data sources have missing values. After reading the data source (csv file for example), is there a way to fill in missing entries in the DataFrame with an arbitrary value. As a comparison with Python Pandas DataFrame we can just call dataframe['some_column_name'].fillna('Missing') Is that possible? Also, is there a forum or a user group for discussions available where we can post questions like these? Thanks

haifengl commented 2 months ago

There are several algorithms to handle missing values in package smile.feature.imputation. SimpleImputer may be used to fill a fixed value. I would suggest trying other advanced algorithms in the package too.

For simplicity, I will add some methods like fillna to Vector classes.

haifengl commented 2 months ago

Feel free to ask questions by creating tickets.

haifengl commented 2 months ago

I added DataFrame.fillna() that applies on FloatVector and DoubleVector.