Closed orchardbirds closed 2 years ago
It seems to me that part of the reason for the speed issue is that the fit of the BucketingProcess class calculates bucket_tables and summaries both in the call to the fit of the underlying (pre-)bucketers, and in the BucketingProcess fit itself.
I have added a get_statistics Boolean to the bucketers which can be used to turn off the calculation of the bucket tables and summary in the bucketers. Making use of this feature in a BucketingProcess results in no loss of functionality, but does result in a minor speed increase. A notebook benchmark_stats_feature in docs/discussion compares running Skorecard with and without the bucketer statistics.
Thanks @dlaprins .
We could probably optimize the speed further, let's do that in separate issues should the need arise.
From the benchmarks notebook, Skorecard is now 'very' slow: