mindsdb / lightwood

Lightwood is Legos for Machine Learning.
GNU General Public License v3.0
450 stars 94 forks source link

[ENH] Various improvements #1168

Closed paxcema closed 1 year ago

paxcema commented 1 year ago

Faster categorical autoencoder

New approach based on a SimpleLabelEncoder (more efficient in terms of training speed and memory consumption) is automatically triggered past a certain amount of observed labels.

While less accurate, this enables handling of large datasets with categorical features that have huge cardinality.

Improved templates for ensembles

Ensembles don't need to strictly follow the base signature now. Instead, JsonAI inspects the subclass' arguments and adds only the relevant ones to the generated code.

Identity Ensemble

Introduces a new IdentityEnsemble that performs no additional operations apart from storing and calling mixers. Ideal for high performance use cases where a single mixer is used.

Faster analysis for grouped time series forecasters

This phase now considers the first forecast for every group, which has the added bonus of keeping the amount of known target values rather high, compared to the current approach that may be reporting pessimistic metrics.