winedarksea / AutoTS

Automated Time Series Forecasting
MIT License
1.1k stars 100 forks source link

Forecast horizon issue #18

Closed nsankar closed 3 years ago

nsankar commented 3 years ago

trafficTest JupyterNB html.zip Hi,

My historical data for training was from 01-04-2020 (start) to 01-10-2020 (end) as a univariate time series with website visitor traffic as the value having a daily frequency.

When I tried forecasting for a 30 day horizon by setting forecast_length=30, frequency='infer' , I was getting the forecast for December'2020 (instead of forecast horizon start from 02/10/20) . I then changed the frequency setting from 'infer' to '1D' and tried the predictions and observed the same result in the forecast.

Then, I tried setting forecast_length=5 and frequency='MS' . For this , I got the forecast starting from 1/1/2021 as shown below. November and December 2020 was missing.

2021-01-01  465546.5
2021-02-01  61375.0
2021-03-01  39539.5
2021-04-01  33329.0
2021-05-01  33581.0

-- Attached herewith is the traffic dataset and the Jupyter Notebook run output for reference.

--

traffic.zip

winedarksea commented 3 years ago

You've got an issue with your datetime conversion from string. The key line below is pd.to_datetime(df.date, format='%d-%m-%Y') Once I ran it with the df below, using a daily forecast for 30 days, I got the month of October, which is what I believe you were expecting.

import pandas as pd
location = "\your\directory\traffic.csv"
df = pd.read_csv(location, parse_dates=True)
df['date'] = pd.to_datetime(df.date)
# note that the 'Day' below is actually the 'Month' and vice-versa if you examine the dataframe
df['Day'] = df['date'].dt.day
df['Month'] = df['date'].dt.month

# Try this instead:
df = pd.read_csv(location)
df['date'] = pd.to_datetime(df.date, format='%d-%m-%Y')
# now it is right
df['Day'] = df['date'].dt.day
df['Month'] = df['date'].dt.month

and a simple forecast for 30 days:

from autots import AutoTS
model = AutoTS(forecast_length=30, frequency='infer',
               ensemble=None, model_list='superfast',
               max_generations=5)
model = model.fit(df, date_col='date', value_col='traffic', id_col=None)

# Print the details of the best model
print(model)

prediction = model.predict()
# point forecasts dataframe
forecasts_df = prediction.forecast
nsankar commented 3 years ago

@winedarksea Thanks for the guidance. It works.