Closed marrrcin closed 1 year ago
Thanks for reporting Marcin, will look into it
@sbrugman an update from my side:
It seems like the following lines in the data generator are causing the popmon
to break:
feature_anomalies = np.random.normal(loc=0.5, scale=0.05, size=num_days)
anomaly_indices = np.random.choice(num_days, num_anomalies, replace=False)
feature_anomalies[anomaly_indices] = np.random.uniform(low=-5, high=0.1, size=num_anomalies)
feature_out_of_range = np.random.uniform(low=0, high=1, size=num_days)
out_of_range_indices = np.random.choice(num_days, num_out_of_range, replace=False)
feature_out_of_range[out_of_range_indices] = np.random.uniform(low=2, high=3, size=num_out_of_range)
Initially, I thought it has something to do with the memory allocation / assignments, but it it seems like the range of values is a problem. If I increase the num_anomalies
to something in closer to at half of my examples (which means - generating more examples that are e.g. out of range), the code proceeds normally. It should work in both cases though.
@marrrcin Could you please provide the minimum reproducible code here as a snippet? Policy doesn't allow us to use colab...
Absolutely!
import pandas as pd
import popmon
import numpy as np
def generate_mock_data(num_days, num_anomalies, num_out_of_range, random_state=666, start_date='1/1/2022'):
np.random.seed(random_state)
time = pd.date_range(start=start_date, periods=num_days, freq='D')
feature_increasing = np.arange(1, num_days+1)
feature_decreasing = np.arange(1000000, 1000000-num_days, -1)
feature_stable = np.random.normal(loc=0.5, scale=0.05, size=num_days)
feature_unstable = np.random.normal(loc=0.5, scale=2.0, size=num_days)
feature_anomalies = np.random.normal(loc=0.5, scale=0.05, size=num_days)
anomaly_indices = np.random.choice(num_days, num_anomalies, replace=False)
feature_anomalies[anomaly_indices] = np.random.uniform(low=-5, high=0.1, size=num_anomalies)
feature_out_of_range = np.random.uniform(low=0, high=1, size=num_days)
out_of_range_indices = np.random.choice(num_days, num_out_of_range, replace=False)
feature_out_of_range[out_of_range_indices] = np.random.uniform(low=2, high=3, size=num_out_of_range)
trend_change = np.concatenate([np.linspace(0, 3.0, num_days//2+(num_days % 2)), np.linspace(3.0, 0, num_days//2)]) + np.random.normal(loc=0, scale=0.01, size=num_days)
cyclic_feature = np.sin(np.linspace(0, 4*np.pi, num_days)) + np.random.normal(loc=0, scale=0.1, size=num_days)
data = {'time': time, 'feature_increasing': feature_increasing, 'feature_decreasing': feature_decreasing, 'feature_stable': feature_stable, 'feature_unstable': feature_unstable, 'feature_anomalies': feature_anomalies, 'feature_out_of_range': feature_out_of_range, 'trend_change': trend_change, 'cyclic_feature': cyclic_feature}
df = pd.DataFrame(data)
return df
df = generate_mock_data(num_days=300, num_anomalies=10, num_out_of_range=13)
report = popmon.df_stability_report(
df,
time_axis="time",
time_width="1w",
)
Can confirm this is a bug with the histogram plotting with outliers, will release a patch soon!
@marrrcin Release is out, feel free to open up another issue if you encounter other problems. Thanks a lot!
Thanks for a quick fix, I confirm that it works now!
Hi, I'm exploring the use of your library and I've stumped across an error when working with my data.
Popmon version:
1.4.5
Error:Full stack trace: ⬇️
``` KeyError Traceback (most recent call last) [Reproduction steps: https://colab.research.google.com/drive/1N59kn7C9LN6W9AJkfz9SougiZoOMM0bn?usp=sharing
Additional information: I'm using a function to generate synthetic data (see colab). When I generate "less" data - e.g. for 200 days, the code works fine, but after some unknown threshold (like 360 days), it breaks. I've also tried changing the
time_width
parameter - sometimes it starts to work with2w
, sometimes it works with1d
but I haven't figured out any pattern.Also note that it happens both for self-referencing data as well as data with a reference set (see second part of the colab).
Expected result: Monitoring report generates properly.