Closed raedbsili1991 closed 1 year ago
So scalecast ports the pandas frequency logic for dates, and that's been pretty reliable for monthly frequencies in my experience. Can you share the array of dates you tried throwing into the Forecaster
object? Sometimes if there is a duplicate or inconsistency, it can confuse the auto-frequency operations.
Thanks,
So scalecast ports the pandas frequency logic for dates, and that's been pretty reliable for monthly frequencies in my experience. Can you share the array of dates you tried throwing into the
Forecaster
object? Sometimes if there is a duplicate or inconsistency, it can confuse the auto-frequency operations.Thanks,
Thanks for rapid response. Sure, it is in the thread above, I attched the dataframe.
I see the issue. The dataset you provided is missing several months -- April, May, and November of 2020; February of 2022; and February of 2023. I would suggest adding the missing dates and filling them in with some logical value (are they missing because they are 0, for example?). Or, if you feel you should not do that, feed the data into the Forecaster
object with a numerical index in lieu of a date (one that counts from 0 through the length of the time series). This would cause the monthly seasonality to go undetected by the object, but you can force it to detect 12 as a seasonal cycle by using f.add_cycle(12)
or f.auto_Xvar_select(irr_cycles=[12])
. However, because of the missing dates, the seasonal cycle is not really 12, so that might make the accuracy of the model degrade. Filling in the missing dates would really be the best option, in my opinion, if you can.
Thank you, I think adding a function that add a "0" (or optional input such as another value or a NaN) to each missing month could be useful, as this occurs often in time series forecasting.
I tried feeding the data under a numerical index, however, that deters significantly the performance.
Yes, assisting missing value imputation has been on my list of to-dos for a while. I will start working on it. If you need anything else related to this issue, please feel free to respond. Otherwise, let me know if I can close it.
Okay thank you. One last question over the seasonality here, any "trick" to manually put the right f.add_ar_terms()
and f.add_AR_terms
after visually looking at the time series plot (or after a AF/PACF plot) ?
I'm not sure there is a consensus "best way" to do that. You might try running an auto SARIMA model and seeing what lag order comes from that. But generally, after looking at ACF and PACF plots, you are trying to find places where the graphs "spike". If there are noticeable seasonal spikes, that could be justification for adding seasonal lags using f.add_AR_terms()
. Otherwise, add as many lags using f.add_ar_terms()
as there are spikes in the plots.
The function Forecaster_with_missing_vals()
has been added to the library. Here's one way you could use it:
from scalecast.util import Forecaster_with_missing_vals
import pandas as pd
import numpy as np
data = pd.read_excel('df_.xlsx') # the dataset attached to this thread
f = Forecaster_with_missing_vals(
y = data['Monthly Quantity'],
current_dates = data['Month Date'],
desired_frequency = 'MS',
fill_strategy = 0.0, # fills with 0s, but other options are available
test_length = .25,
future_dates = 18,
).round()
The function
Forecaster_with_missing_vals()
has been added to the library. Here's one way you could use it:from scalecast.util import Forecaster_with_missing_vals import pandas as pd import numpy as np data = pd.read_excel('df_.xlsx') # the dataset attached to this thread f = Forecaster_with_missing_vals( y = data['Monthly Quantity'], current_dates = data['Month Date'], desired_frequency = 'MS', fill_strategy = 0.0, # fills with 0s, but other options are available test_length = .25, future_dates = 18, ).round()
That's great. I already did that with a simple function, I was gonna put it down. We can close this thread. Thank you again.
When creating the forecaster object, and inputing the database, it seems that the ovject doesn't detect the "Freq", as it shows Freq = None, when displaying the forecaster obeject:
f = Forecaster( y = data['Original Date'], # required current_dates = data['Month Date'], # required future_dates=18, cis = False, # choose whether or not to evaluate confidence intervals for all models, metrics = ['mae','r2','rmse','mape'], # the metrics to evaluate when testing/tuning models )
df_.xlsx
Either I use the "Month Date" of the "Original date", it is displaying always None as Frequency.
Actually, the idea was to transform the Quantity with a non organised Original date to Montly Quantity by summing up over each month and perform the prediction to 18 months in the future, that's why I put
future_dates = 18
.The full code:
` def forecaster_0(f): for m in models:
OUTPUTS/RESULTS:
Despite the results on the Test set doesn't seem to be very terrible, the forecasts on the future aren't well generated, I suspect again the non detection of the Frequency, maybe another reason I am missing ?
FORECASTING VALUES (Supposed to be 18 months in future)
| DATE | lasso | gbt | ridge | adaboost | xgboost -- | -- | -- | -- | -- | -- | -- 2023-05-08 | 385.178400 | 660.264735 | 436.787649 | 640.0 | 473.736572 2023-05-15 | 431.244985 | 637.779493 | 463.456754 | 640.0 | 473.736572And as a consequence, the
f.seasonal_decompose()
as well as thepipeline_backtest
don't work.Also, the
find_optimal_transformation
wasn't useful and it did degraded significantly the results.