oracle / accelerated-data-science

ADS is the Oracle Data Science Cloud Service's python SDK supporting, model ops (train/eval/deploy), along with running workloads on Jobs and Pipeline resources.
https://accelerated-data-science.readthedocs.io/
Universal Permissive License v1.0
87 stars 43 forks source link

Optimised report loading for anomaly operator #897

Closed govarsha closed 3 months ago

govarsha commented 3 months ago

Made following changes to optimise report loading for anomaly detector:

  1. If non-anomolous data points > 1000, it downsamples them to 1000. I chose the threshold as 1000 because visually a plot with more than 1000 datapoints is too crowded.
  2. All anomolous data points are included
  3. In the report, we show the whole dataset which leads to very large file size. Hence, we are showing 1000 data points at max here as well.
  4. The optimisation can be turned off by passing optimize_report = False in spec

Results: I have used cpu_utilization_asg_misconfiguration.csv from NAB which has 18051 data points.

Report without optimisation: Loading time - 15 seconds Size - 6.2MB

Screenshot 2024-07-03 at 12 27 33 PM Screenshot 2024-07-03 at 12 27 42 PM

Report with optimisation: Loading time - 2/3 seconds Size - 411KB [

Screenshot 2024-07-03 at 12 27 04 PM

](url)

Screenshot 2024-07-03 at 12 27 18 PM