Is there a description of the automl method?

PGijsbers commented 3 years ago

I was wondering if there is good documentation about the AutoML process behind LAMA. I watched the general overview video, but it's still not clear how exactly optimization works internally. E.g. what optimization methods are used? Are only linear models and LightGBM considered? What kind of feature engineering is used, and is it static (same strategy for each dataset) or dynamic (LAMA explores various approaches to see what works on the data).

If possible, a written source is best (ideally a paper, but a documentation page works too).

alexmryzhkov commented 3 years ago

Hello, @PGijsbers!

The documentation can be found here (https://lightautoml.readthedocs.io/en/latest/). Right now it is only an API description and tutorials, but we are working on filling it with algorithm descriptions. We already have the written paper, and now we are waiting for a decision on it before publication.

For hyperparameter optimization, we use a hybrid TPE and CMA-ES sampler. As a part of our pipeline, 2 model types are used - linear model and gradient boosting (LightGBM, CatBoost). For gradient boosting, we use hyperparameters tuning as long as it fits in the user-defined time budget. Feature engineering is different for each model type, and the same strategy is used to solve specific ML tasks on each dataset - during this strategy execution LightAutoML assumes the best handling method to convert each feature using metadata and expert rules.

PGijsbers commented 3 years ago

Thanks! Would it be possible to post back here when the paper is available? I'm interested 👀

alexmryzhkov commented 3 years ago

@PGijsbers sure - we are going to provide it here and in readme as soon as it becomes available.

PGijsbers commented 3 years ago

Do you have any estimate? Maybe a preprint to look forward to?

alexmryzhkov commented 3 years ago

@PGijsbers now we have an answer for this issue - https://arxiv.org/abs/2109.01528 😎

sberbank-ai-lab / LightAutoML

Is there a description of the automl method? #22