Closed Lopa2016 closed 2 months ago
I want to implement the series_outlier method in Python & used the following code
import pandas as pd import numpy as np from scipy.stats import norm
data = { 'series': [67.95675, 58.63898, 33.59188, 4906.018, 5.372538, 702.1194, 0.037261, 11161.05, 1.403496, 100.116] } df = pd.DataFrame(data)
def custom_percentile_outliers(series, p_low=10, p_high=90):
percentile_low = np.percentile(series, p_low) percentile_high = np.percentile(series, p_high)
# Calculate Z-scores for the percentiles assuming normal distribution z_low = norm.ppf(p_low / 100)
z_high = norm.ppf(p_high / 100)
normalization_factor = (2 z_high - z_low) / (2 z_high - 2.704)
return series.apply(lambda x: (x - percentile_high) / (percentile_high - percentile_low) normalization_factor if x > percentile_high else ((x - percentile_low) / (percentile_high - percentile_low) normalization_factor if x < percentile_low else 0))
df['outliers'] = custom_percentile_outliers(df['series'], p_low=10, p_high=90)
print(df) And getting the following results for the series
series outliers
0 67.956750 0.000000 1 58.638980 0.000000 2 33.591880 0.000000 3 4906.018000 0.000000 4 5.372538 0.000000 5 702.119400 0.000000 6 0.037261 0.006067 7 11161.050000 -27.776847 8 1.403496 0.000000 9 100.116000 0.000000
While with the series_outlier function I get the below results enter image description here
I referred the github article https://github.com/microsoft/Kusto-Query-Language/issues/136 & also tried implementing & manually calculating with the help of the solution given on stackoverflow - How does Kusto series_outliers() calculate anomaly scores?
I am probably going wrong with the normalization score calculation. Would be great if someone can help
I want to implement the series_outlier method in Python & used the following code
import pandas as pd import numpy as np from scipy.stats import norm
Load the data into a DataFrame
data = { 'series': [67.95675, 58.63898, 33.59188, 4906.018, 5.372538, 702.1194, 0.037261, 11161.05, 1.403496, 100.116] } df = pd.DataFrame(data)
Function to calculate the outlier score based on custom percentiles
def custom_percentile_outliers(series, p_low=10, p_high=90):
Calculate custom percentiles
percentile_low = np.percentile(series, p_low) percentile_high = np.percentile(series, p_high)
z_high = norm.ppf(p_high / 100)
Calculate normalization factor
normalization_factor = (2 z_high - z_low) / (2 z_high - 2.704)
Calculate outliers score
return series.apply(lambda x: (x - percentile_high) / (percentile_high - percentile_low) normalization_factor if x > percentile_high else ((x - percentile_low) / (percentile_high - percentile_low) normalization_factor if x < percentile_low else 0))
Apply the custom percentile outlier scoring function
df['outliers'] = custom_percentile_outliers(df['series'], p_low=10, p_high=90)
Display the DataFrame with outliers
print(df) And getting the following results for the series
0 67.956750 0.000000 1 58.638980 0.000000 2 33.591880 0.000000 3 4906.018000 0.000000 4 5.372538 0.000000 5 702.119400 0.000000 6 0.037261 0.006067 7 11161.050000 -27.776847 8 1.403496 0.000000 9 100.116000 0.000000
While with the series_outlier function I get the below results enter image description here
I referred the github article https://github.com/microsoft/Kusto-Query-Language/issues/136 & also tried implementing & manually calculating with the help of the solution given on stackoverflow - How does Kusto series_outliers() calculate anomaly scores?
I am probably going wrong with the normalization score calculation. Would be great if someone can help