ing-bank / skorecard

scikit-learn compatible tools for building credit risk acceptance models
https://ing-bank.github.io/skorecard/
MIT License
85 stars 24 forks source link

Skorecard() class is slow #74

Closed orchardbirds closed 2 years ago

orchardbirds commented 2 years ago

From the benchmarks notebook, Skorecard is now 'very' slow:

Screenshot 2021-11-22 at 13 07 15

dlaprins commented 2 years ago

It seems to me that part of the reason for the speed issue is that the fit of the BucketingProcess class calculates bucket_tables and summaries both in the call to the fit of the underlying (pre-)bucketers, and in the BucketingProcess fit itself.

I have added a get_statistics Boolean to the bucketers which can be used to turn off the calculation of the bucket tables and summary in the bucketers. Making use of this feature in a BucketingProcess results in no loss of functionality, but does result in a minor speed increase. A notebook benchmark_stats_feature in docs/discussion compares running Skorecard with and without the bucketer statistics.

timvink commented 2 years ago

Thanks @dlaprins .

We could probably optimize the speed further, let's do that in separate issues should the need arise.