numaproj / numalogic

Collection of operational time series ML models and tools
https://numalogic.numaproj.io/
Apache License 2.0
166 stars 28 forks source link

Support training for latency based anomalies specifically for perf-anomaly #409

Open shashank10456 opened 1 month ago

shashank10456 commented 1 month ago

Explain what this PR does.

This PR supports Anomaly Detection on fields that use valuesDoubleSketches. We add aggregations and postaggregations which run natively on druid. These sketches are converted to values using these postaggregations and are run on druid. This would enable us to use anomaly detection for inputs using sketches(https://datasketches.apache.org/). For example, latency based anomaly.

Also, I have made few changes to DockerFile and added a patch for Numalogic 0.9.1 to avoid CVE issues. This is important for the perf-anomaly team to avoid moving to Numaflow 1.2.1 and updating all the UDFs and UDSinks. This would help them save lot of time by just upgrading the ML vertices.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.17%. Comparing base (f29f771) to head (883a32c). Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
numalogic/connectors/_config.py 90.00% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #409 +/- ## ========================================== - Coverage 93.07% 92.17% -0.90% ========================================== Files 97 97 Lines 4492 4781 +289 Branches 387 430 +43 ========================================== + Hits 4181 4407 +226 - Misses 231 276 +45 - Partials 80 98 +18 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

qhuai commented 1 month ago

Please replace "Explain what this PR does." with the real description & purpose of this PR.