zillow / luminaire

Luminaire is a python package that provides ML driven solutions for monitoring time series data.
https://zillow.github.io/luminaire
Apache License 2.0
761 stars 58 forks source link

'ErrorMessage': 'Due to a recent data gap, training is waiting for more data to populate' #132

Open 9race opened 2 months ago

9race commented 2 months ago

I tried running the below code snippet on my data with 28 data points, no gaps. de_obj = DataExploration(freq='D', data_shift_truncate=False, is_log_transformed=False, fill_rate=0.8, sig_level=0.001) print(de_obj.min_ts_length) imputed_data, pre_prc = de_obj.profile(df) print(len(df))

However, I keep getting the error 'ErrorMessage': 'Due to a recent data gap, training is waiting for more data to populate' for pre_prc. From the source code online it seem that 'min_ts_length' by default is 21 for daily data, but even if I further lower it I still get the same error message.

I also created dummy data with 208 data points and the error is the same, so I assume this isn't really about the number of data points. Any help?

sayanchk commented 2 months ago

Hello @9race, it will be great if it's possible to have to dummy data to replicate and debug the error in my end. Given you have daily data, it should not show this specific error unless you have more missing values compared to the specified fill_rate.

9race commented 2 months ago

Hi @sayanchk, I've managed to fix the error by playing with the types of my Dataframe indexes and columns. However, I run into a new issue where anomalies that visually seem obvious to me are not being detected. I was wondering if there is some minimum number of datapoints that is required for sufficient model performance. I've attached dummy data, as well as a screenshot of the data with obvious-seeming anomalies for reference.

luminaire_testing.csv Graph