zillow / luminaire

Luminaire is a python package that provides ML driven solutions for monitoring time series data.
https://zillow.github.io/luminaire
Apache License 2.0
762 stars 59 forks source link

HyperparameterOptimization is None #123

Open webbug2005 opened 1 year ago

webbug2005 commented 1 year ago
df_conc.head(5)
index | raw
2021-07-07 09:00:01 | 0.48
2021-07-07 09:30:06 | 0.46
2021-07-07 10:00:01 | 1.39
2021-07-07 10:30:06 | 0.84
2021-07-07 11:00:02 | 1.01

index    datetime64[ns]
raw             float64
dtype: object

FOr this data when i try to run

from luminaire.optimization.hyperparameter_optimization import HyperparameterOptimization
hopt_obj = HyperparameterOptimization(freq='H', detection_type='OutlierDetection')
print(type(hopt_obj))
opt_config = hopt_obj.run(data=df_conc)
print(opt_config)

<class 'luminaire.optimization.hyperparameter_optimization.HyperparameterOptimization'> None

my frequency is actually every 30 mins.. but every freq= gives me same None .

Any help is appreciated.

sayanchk commented 1 year ago

@webbug2005 Currently, Luminaire batch models only supports certain pandas offsets (link) and 30 mins is currently not supported. That being said, you can still use the WindowDensityModel which supports any arbitrary time frequency. The difference with the batch model is window based model mostly focus identifying anomalies over time windows instead of individual outlying time points.

webbug2005 commented 1 year ago

my end goal is the use streaming data, that comes every 30 mins.. so will skip to directly testing WindowDensity. But this means i cannot use optimization for streaming data , as it doesnt support 30 mins ? (if i keep a window of 3 months)

sayanchk commented 1 year ago

Optimization is only for the batch models in Luminaire. WindowDensityModel has an embedded optimization precess to pick the right parameters, so, you don't need to use the optimization module separately.

webbug2005 commented 1 year ago

@sayanchk , I tried with many data sets.. kept it limited to 30 days data , with every row at 30 mins interval. I also split the timestamp to just HH:MM as seconds was never exact. But keep hitting below error.

None {'success': False, 'ErrorMessage': 'Invalid number of FFT data points (0) specified.'}

sayanchk commented 1 year ago

@webbug2005, would you be able to share any reproducible code with a dummy data in colab? It will be easier for me to debug that way!