EncodedDs.__getitem__ now retrieves rows from an "offline" cache that is populated at the end of the initialization method, instead of just-in-time. This enables encoders (which are assumed to be optimized, mostly there but some exceptions we're working on at the moment) to do batch encoding at a much faster rate.
Optimized some encoders with vectorized operations (see point 1): NumericalEncoder, TsNumericEncoder, TsArrayNumericEncoder
Reactivated Neural mixer's early stopping with a patience-based mechanism, and a learning rate search procedure with a standard range of options.
Deactivated RandomForest mixer's hyperparam optimization by default due to small loss in accuracy traded by a large speedup in training runtime.
Effects
Lightwood becomes an order of magnitude faster across all ~50 benchmarked datasets (average speedup is ~7.5x, median is ~16.5x)
Changelog
EncodedDs.__getitem__
now retrieves rows from an "offline" cache that is populated at the end of the initialization method, instead of just-in-time. This enables encoders (which are assumed to be optimized, mostly there but some exceptions we're working on at the moment) to do batch encoding at a much faster rate.NumericalEncoder
,TsNumericEncoder
,TsArrayNumericEncoder
Effects
Lightwood becomes an order of magnitude faster across all ~50 benchmarked datasets (average speedup is ~7.5x, median is ~16.5x)