Open PeterPirog opened 2 years ago
Nice suggestion! I could see this being very useful to run on a sample of the dataset in the automl
package to prune unhelpful features from the config.
@tgaddair I use mlxtend tool and its very usefull for me. It can work with many models neural nets, xgboost and other trees models, sklearn models. The result can be saved i xlsx file as report. There are two main ways to find the best features sets forward and reverse search, typically I use both.
I have some experience with features engineering and finding best model parameters but my development skills are lower so I think ludwig will be very usefull for me. Now I try to add option fill_with_median
for numerical values as missing_value_strategy
because median isn't good option for numerical features if there is big skewness ( long tails in histograms).
Now I try to understand dependencies between specified modules of framework
In the future the feature selection option can be very usefull. I use mlxtend library when I use tensorflow and ray rllib, it works very well.
https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/