mindsdb / lightwood

Lightwood is Legos for Machine Learning.
GNU General Public License v3.0
450 stars 94 forks source link

[ENH] Offline batching #1147

Closed paxcema closed 1 year ago

paxcema commented 1 year ago

Changelog

  1. EncodedDs.__getitem__ now retrieves rows from an "offline" cache that is populated at the end of the initialization method, instead of just-in-time. This enables encoders (which are assumed to be optimized, mostly there but some exceptions we're working on at the moment) to do batch encoding at a much faster rate.
  2. Optimized some encoders with vectorized operations (see point 1): NumericalEncoder, TsNumericEncoder, TsArrayNumericEncoder
  3. Reactivated Neural mixer's early stopping with a patience-based mechanism, and a learning rate search procedure with a standard range of options.
  4. Deactivated RandomForest mixer's hyperparam optimization by default due to small loss in accuracy traded by a large speedup in training runtime.

Effects

Lightwood becomes an order of magnitude faster across all ~50 benchmarked datasets (average speedup is ~7.5x, median is ~16.5x)