ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.09k stars 1.19k forks source link

Feature selection option - mlxtend feature selector #1516

Open PeterPirog opened 2 years ago

PeterPirog commented 2 years ago

In the future the feature selection option can be very usefull. I use mlxtend library when I use tensorflow and ray rllib, it works very well.

https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/

tgaddair commented 2 years ago

Nice suggestion! I could see this being very useful to run on a sample of the dataset in the automl package to prune unhelpful features from the config.

PeterPirog commented 2 years ago

@tgaddair I use mlxtend tool and its very usefull for me. It can work with many models neural nets, xgboost and other trees models, sklearn models. The result can be saved i xlsx file as report. There are two main ways to find the best features sets forward and reverse search, typically I use both.

I have some experience with features engineering and finding best model parameters but my development skills are lower so I think ludwig will be very usefull for me. Now I try to add option fill_with_median for numerical values as missing_value_strategy because median isn't good option for numerical features if there is big skewness ( long tails in histograms).

Now I try to understand dependencies between specified modules of framework