mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

More golden features [enhancement] #224

Closed mglowacki100 closed 3 years ago

mglowacki100 commented 3 years ago

Please add more golden features like: sum, multiply, min, max, average.

pplonski commented 3 years ago

Hi @mglowacki100!

This can be pretty easily added (I think). Would you like to run operators on features pairs, triplets, or more?

I'm very interested in your feedback about the current release of mljar-supervised.

mglowacki100 commented 3 years ago

Hi Piotr, Yes, binary features are straightforward to add. With triplets there could be combinatorial explosion, so maybe this step after feature selection? mljar looks pretty good 👍 It would be interesting to compare it to autogluon (https://arxiv.org/abs/2003.06505 nice comparison to autosklearn, h2o automl and google tables), as far I remember it is similar from architectural POV (Caruana ensembling, with a liittle more fancy NN - embedding and skip connection). I've run mljar on Mercedes-Benz and got r2=0.551 on private leadeboard, much better than autogluon so I think there is a great potential in it.

pplonski commented 3 years ago

I've added to golden features generation:

I didn't add average because it is similar to sum, min and max are similar to diff.