I measured the score_samples performance for Isolation Forest (which is in essence the base for everything else) and found the following:
Here n_samples is the shape for score_samples argument. As you can see, there is very significant multi threading overhead for data sets smaller than 2048 instances. Please note that our default n_jobs is -1. The overhead is responsible for slow AAD optimizations since AAD calls score_samples for known data subset (very little, <100). The measurements were carried out using Python profiling.
Hello,
I measured the
score_samples
performance for Isolation Forest (which is in essence the base for everything else) and found the following:Here
n_samples
is the shape forscore_samples
argument. As you can see, there is very significant multi threading overhead for data sets smaller than 2048 instances. Please note that our defaultn_jobs
is-1
. The overhead is responsible for slow AAD optimizations since AAD callsscore_samples
for known data subset (very little, <100). The measurements were carried out using Python profiling.