mindsdb / lightwood

Lightwood is Legos for Machine Learning.
GNU General Public License v3.0
442 stars 93 forks source link

[ENH] Top level cache for Predictor.featurize() #1145

Closed paxcema closed 1 year ago

paxcema commented 1 year ago

Changelog

Effect:

This reduces learn() runtime anywhere from 15% to 50% depending on the dataset and predictor properties.

pafluxa commented 1 year ago

Greetings,

I can confirm that this PR speeds-up the learning by at least 15% in all test datasets.

The speed-up is less noticeable when dealing with larger datasets (>10K rows) and seems to decrease non-linearly with the number of rows used. In particular, the gain drops from more than 20% to barely above 10% when scaling the number of rows from 10000 to 100000. I interpret this as an indication that the performance issues when dealing with large datasets is not entirely solved by this PR.

Finally, judging by the results of the unit tests (nicely done!) the changes that were implemented seem to break some of the functionality used for time series analysis/models.

paxcema commented 1 year ago

Great! I'll get the tests passing and merge.