linkedin / greykite

A flexible, intuitive and fast forecasting library
BSD 2-Clause "Simplified" License
1.81k stars 106 forks source link

run_forecast_config crash with regressors but not without #101

Closed bernardoct closed 1 year ago

bernardoct commented 1 year ago

I'm running into an issue in which run_forecast_config crashes when I add regressors but doesn't crash if I don't do so. The error is because of an empty fut_df, which I understand to be the data frame with time and regressors against which to run the forecast. The reason for the empty fut_df is because the method make_future_dataframe (see callstack below) is returning zero length on line 349 of univariate_time_series.py for there not being missing values in the y column. I don't know if this was done by design, but if my assessment is correct it would probably be a good idea to either allow the model to run on train data only or raise an exception that explains that one can't have an input data frame with regressors and no missing values in y for forecasting. See my call stack and code below.

Error

Exception has occurred: ValueError
``fut_df`` must be a dataframe of non-zero size.
  File "/data01/users/btrindad/greykite/greykite/algo/forecast/silverkite/forecast_silverkite.py", line 2047, in predict
    raise ValueError("``fut_df`` must be a dataframe of non-zero size.")
  File "/data01/users/btrindad/greykite/greykite/sklearn/estimator/base_silverkite_estimator.py", line 363, in predict
    pred_res = self.silverkite.predict(
  File "/data01/users/btrindad/greykite/greykite/framework/pipeline/utils.py", line 769, in get_forecast
    predicted_df = trained_model.predict(df)
  File "/data01/users/btrindad/greykite/greykite/framework/pipeline/pipeline.py", line 748, in forecast_pipeline
    forecast = get_forecast(
  File "/data01/users/btrindad/greykite/greykite/framework/pipeline/pipeline.py", line 209, in pipeline_wrapper
    return pipeline_function(
  File "/data01/users/btrindad/greykite/greykite/framework/templates/forecaster.py", line 378, in run_forecast_config
    self.forecast_result = forecast_pipeline(**pipeline_parameters)
  File "/xxxx/xxxx/xxxxx.py", line 459, in <module>
    result = forecaster.run_forecast_config(
ValueError: ``fut_df`` must be a dataframe of non-zero size.

Location of code causing fut_df to have zero rows:

(make_future_dataframe (\data01\users\btrindad\greykite\greykite\framework\input\univariate_time_series.py:354)
forecast_pipeline (\data01\users\btrindad\greykite\greykite\framework\pipeline\pipeline.py:738)
pipeline_wrapper (\data01\users\btrindad\greykite\greykite\framework\pipeline\pipeline.py:209)
run_forecast_config (\data01\users\btrindad\greykite\greykite\framework\templates\forecaster.py:378)
<module> (\xxxx\xxxx\xxxxx.py:459)

My code


    metadata = MetadataParam(
        time_col="MONTH",
        value_col="y",
        freq="MS"
    )

    forecaster = Forecaster()

    regressors = dict(
        regressor_cols=['Temperature_mean', 'Temperature_std', 'Precipitation_sum', 'Precipitation_dayscount']
    )

    model_components = ModelComponentsParam(regressors=regressors)
    fcst_congig = ForecastConfig(
        model_template=ModelTemplateEnum.SILVERKITE.name,
        # forecast_horizon=12,
        coverage=0.95,
        metadata_param=metadata,
        model_components_param=model_components
    )
    result = forecaster.run_forecast_config(
        df=ms_query,
        config=fcst_congig
    )

Test data

,MONTH,y,YEAR,Temperature_mean,Temperature_std,Precipitation_sum,Precipitation_dayscount
0,2009-01-01,706782.2079999995,2009,27.096774193548388,6.399764780623726,21.590000000000007,21.0
1,2009-02-01,894551.8999999987,2009,35.785714285714285,7.955231881329059,1.0500000000000003,17.0
2,2009-03-01,1032500.3039999986,2009,41.45161290322581,8.75914268893716,1.7100000000000002,19.0
3,2009-04-01,748408.4079999992,2009,53.96666666666667,9.506742918557345,4.409999999999999,24.0
4,2009-05-01,846760.6839999994,2009,63.58064516129032,6.097946067042504,4.8699999999999966,22.0
5,2009-06-01,983535.4759999991,2009,69.23333333333333,4.931834192848246,9.96,25.0
6,2009-07-01,1007080.2719999988,2009,71.80645161290323,13.970013970020958,6.859999999999999,21.0
7,2009-08-01,1169687.2439999986,2009,76.7741935483871,4.998064141374174,7.209999999999999,22.0
8,2009-09-01,1049426.047999999,2009,66.46666666666667,5.636906300771563,1.9500000000000002,13.0
9,2009-10-01,1133570.8119999988,2009,55.0,7.429670248402687,5.560000000000001,22.0
10,2009-11-01,987936.7079999987,2009,50.6,4.530357373434736,1.54,17.0
11,2009-12-01,1000694.5959999987,2009,35.064516129032256,8.578793558812988,6.690000000000001,19.0
12,2010-01-01,873615.3799999991,2010,31.967741935483872,8.376490398600685,2.4999999999999996,18.0
13,2010-02-01,736004.3239999993,2010,32.75,3.8163876656879103,4.88,17.0
14,2010-03-01,1042932.6599999986,2010,47.935483870967744,6.12609437227868,8.339999999999998,20.0
15,2010-04-01,790205.151999999,2010,57.96666666666667,6.7542238806457275,3.42,22.0
16,2010-05-01,919653.2839999989,2010,67.03225806451613,8.215772923540598,3.649999999999999,13.0
17,2010-06-01,1154474.4199999983,2010,75.2,6.0537819470203065,2.309999999999999,19.0
18,2010-07-01,1302039.8599999987,2010,80.35483870967742,5.141649457108578,3.84,12.0
19,2010-08-01,1153508.751999999,2010,76.3225806451613,5.368966981795741,3.35,16.0
20,2010-09-01,1474937.8159999987,2010,70.26666666666667,6.416215090842418,4.13,13.0
21,2010-10-01,1410313.607999998,2010,57.32258064516129,6.046415449251419,4.089999999999997,22.0
22,2010-11-01,1051603.4759999989,2010,46.766666666666666,5.721606558406076,2.28,17.0
23,2010-12-01,951533.043999999,2010,32.096774193548384,6.28413260368089,3.3299999999999987,17.0
24,2011-01-01,957420.5519999987,2011,27.548387096774192,6.270771933754351,3.1499999999999995,20.0
25,2011-02-01,957273.1959999986,2011,34.25,8.707701997316748,3.9699999999999998,16.0
26,2011-03-01,1130665.5799999982,2011,42.16129032258065,7.326193983206414,6.869999999999998,18.0
27,2011-04-01,566344.4599999997,2011,54.4,8.75962524784928,6.579999999999998,22.0
28,2011-05-01,1013594.6039999983,2011,65.58064516129032,7.856098240850386,5.949999999999999,19.0
29,2011-06-01,1116950.2519999985,2011,73.23333333333333,5.89379178536156,4.799999999999999,17.0
30,2011-07-01,1208907.1279999986,2011,81.03225806451613,4.84068776771609,2.13,11.0
31,2011-08-01,1438707.6879999982,2011,76.06451612903226,3.974380319588351,17.250000000000004,16.0
32,2011-09-01,1559585.3501219973,2011,70.93333333333334,6.313277379013838,8.52,14.0
33,2011-10-01,889496.167999999,2011,57.12903225806452,8.090084200154616,5.889999999999999,16.0
34,2011-11-01,962743.1139999985,2011,47.63333333333333,14.461562211918718,3.53,15.0
35,2011-12-01,1084642.3968279986,2011,42.483870967741936,7.154816409207322,4.9099999999999975,18.0
36,2012-01-01,887411.4919999989,2012,36.806451612903224,8.158510300451955,2.7399999999999998,15.0
37,2012-02-01,858523.7319999991,2012,39.964285714285715,5.087982521350298,1.79,15.0
38,2012-03-01,1006668.0683259988,2012,51.41935483870968,9.493064814373303,1.2800000000000005,16.0
39,2012-04-01,799649.4658239989,2012,54.6,7.073993165839561,3.7399999999999998,22.0
40,2012-05-01,957590.347999999,2012,66.61290322580645,6.998463733109996,5.1499999999999995,22.0
41,2012-06-01,1159368.2893619984,2012,71.83333333333333,7.120070385807334,4.16,15.0
42,2012-07-01,1118193.427999999,2012,79.80645161290323,4.384969439944524,3.73,17.0
43,2012-08-01,1385984.9079999984,2012,77.25806451612904,3.864088870057063,3.0999999999999996,15.0
44,2012-09-01,1433426.239810998,2012,69.03333333333333,7.029412754858041,2.7499999999999996,15.0
45,2012-10-01,1130910.3828449987,2012,58.58064516129032,7.5840806674172745,3.0599999999999996,19.0
46,2012-11-01,1163294.0879999984,2012,43.0,4.961923987242489,1.23,7.0
47,2012-12-01,822582.9902639993,2012,40.87096774193548,7.003532134972422,5.4399999999999995,21.0
48,2013-01-01,1058544.1679999984,2013,34.645161290322584,9.55527214716837,2.99,17.0
49,2013-02-01,929917.3399999988,2013,33.214285714285715,6.367493505687583,3.3299999999999996,20.0
50,2013-03-01,906869.9130069988,2013,39.70967741935484,4.49587624671837,3.509999999999999,19.0
51,2013-04-01,818724.782977999,2013,52.4,7.708481669899229,1.6100000000000005,16.0
52,2013-05-01,969047.463999999,2013,62.806451612903224,8.54563964775296,7.289999999999998,18.0
53,2013-06-01,1100197.4140309982,2013,72.8,6.671607364615092,8.650000000000002,22.0
54,2013-07-01,1027217.3318089988,2013,79.93548387096774,5.341257553990871,3.0500000000000003,20.0
55,2013-08-01,1390347.172094998,2013,74.25806451612904,3.838122300774031,5.089999999999999,20.0
56,2013-09-01,1495499.832919798,2013,67.4,7.180433182591085,2.21,17.0
57,2013-10-01,1202056.7002189986,2013,57.225806451612904,14.721208459043831,0.8700000000000003,15.0
58,2013-11-01,1026309.1079999987,2013,46.06666666666667,9.577031911290945,3.9000000000000004,21.0
59,2013-12-01,1115877.1909509983,2013,38.29032258064516,9.627022898719682,4.7799999999999985,22.0
60,2014-01-01,1122513.5035099972,2014,28.193548387096776,10.940503811795597,3.489999999999999,25.0
61,2014-02-01,1132019.8702489971,2014,30.857142857142858,7.331890189890581,5.619999999999998,20.0
62,2014-03-01,1287813.9761039948,2014,37.774193548387096,8.95436458724405,4.49,14.0
63,2014-04-01,893307.455440797,2014,53.13333333333333,7.064888086740011,7.050000000000001,16.0
64,2014-05-01,902606.8301447973,2014,66.06451612903226,5.019531743572422,4.41,18.0
65,2014-06-01,1158366.1072273166,2014,74.53333333333333,4.768671710765742,4.6,13.0
66,2014-07-01,1352174.8354107952,2014,78.6774193548387,4.044643344591239,5.979999999999998,19.0
67,2014-08-01,1297130.0931539948,2014,75.35483870967742,4.151693526717136,2.46,14.0
68,2014-09-01,1275825.6739819956,2014,70.03333333333333,7.039216910469773,1.2500000000000002,12.0
69,2014-10-01,1167715.7791739954,2014,59.354838709677416,6.911576699503398,4.18,20.0
70,2014-11-01,859583.3101447974,2014,44.06666666666667,9.522870232827096,4.26,15.0
71,2014-12-01,879553.2791669972,2014,39.516129032258064,6.222919827765395,5.769999999999998,23.0
72,2015-01-01,1063323.5894862772,2015,28.64516129032258,7.157971719683232,5.6499999999999995,16.0
73,2015-02-01,798509.713737997,2015,22.321428571428573,6.594349834172412,2.2800000000000002,18.0
74,2015-03-01,808266.9053209973,2015,36.806451612903224,7.304881266836623,4.799999999999999,21.0
75,2015-04-01,958374.7294579976,2015,53.2,6.738463960853699,2.7599999999999993,12.0
76,2015-05-01,982364.0405999963,2015,68.45161290322581,7.1172968168044415,2.62,11.0
77,2015-06-01,1234824.2636939953,2015,71.8,7.716529504006768,5.289999999999999,21.0
78,2015-07-01,1460496.2762439947,2015,78.45161290322581,4.17004164389613,3.249999999999999,17.0
79,2015-08-01,1258718.012721995,2015,78.0,3.8297084310253515,1.6900000000000002,6.0
80,2015-09-01,1419741.3304079948,2015,73.36666666666666,5.91015101816115,2.66,9.0
81,2015-10-01,1286391.5931559948,2015,56.83870967741935,7.1185053402782446,3.58,10.0
82,2015-11-01,881765.2391909969,2015,51.7,8.28854631404084,1.9300000000000002,11.0
83,2015-12-01,863269.7581369971,2015,49.74193548387097,7.357389672682759,5.27,19.0
84,2016-01-01,956522.0804109968,2016,33.16129032258065,7.312531591697222,4.0299999999999985,16.0
85,2016-02-01,757970.4865639977,2016,37.172413793103445,11.193330174442666,24.37,19.0
86,2016-03-01,901169.3395849967,2016,47.87096774193548,8.597448983986942,1.7200000000000004,18.0
87,2016-04-01,902789.6734709973,2016,52.56666666666667,9.863493587321202,2.37,14.0
88,2016-05-01,960810.2228079966,2016,63.275862068965516,9.891903441430328,3.6399999999999992,21.0
89,2016-06-01,1123411.8534640358,2016,72.93333333333334,4.425111207122633,3.229999999999999,14.0
90,2016-07-01,1466193.8762599945,2016,79.83870967741936,4.662594229207236,5.919999999999999,17.0
91,2016-08-01,1318689.156624995,2016,79.74193548387096,4.411860846517292,1.8800000000000001,14.0
92,2016-09-01,1431721.2309322746,2016,71.8,7.068141247149435,2.84,13.0
93,2016-10-01,1217703.7067973157,2016,58.483870967741936,8.472980458461024,3.78,18.0
94,2016-11-01,911749.4351015567,2016,49.333333333333336,6.5565620371599005,4.809999999999999,11.0
95,2016-12-01,914392.6639489969,2016,37.12903225806452,7.03203116927047,3.0099999999999993,15.0
96,2017-01-01,829880.0841935576,2017,38.096774193548384,9.10441225893496,3.8499999999999996,22.0
97,2017-02-01,833172.231067997,2017,41.464285714285715,9.098156682892144,2.5700000000000003,13.0
98,2017-03-01,781520.8465487568,2017,39.806451612903224,10.669018557920278,5.389999999999999,20.0
99,2017-04-01,900609.8448109966,2017,57.2,8.243701880202558,4.299999999999999,20.0
100,2017-05-01,883323.9966799965,2017,61.096774193548384,7.963897030598682,7.349999999999999,19.0
101,2017-06-01,1058783.5216820769,2017,72.36666666666666,7.734755120985356,3.5899999999999994,24.0
102,2017-07-01,1414651.439849994,2017,69.74193548387096,23.731789849532326,4.43,19.0
103,2017-08-01,1216260.6456439942,2017,74.58064516129032,4.681696227852945,4.4399999999999995,17.0
104,2017-09-01,1190050.9245409954,2017,71.23333333333333,6.626401391577397,2.03,11.0
105,2017-10-01,1205627.7189799948,2017,64.0,7.878240074195929,3.9900000000000007,15.0
106,2017-11-01,947022.9397345162,2017,45.666666666666664,7.438637868140469,1.6800000000000006,17.0
107,2017-12-01,864244.8985367569,2017,34.645161290322584,10.37159064334484,2.1700000000000004,20.0
108,2018-01-01,871633.6928038374,2018,31.06896551724138,12.056210712689472,2.94,12.0
109,2018-02-01,866314.7328563168,2018,41.32142857142857,9.85093931008639,5.729999999999999,20.0
110,2018-03-01,799450.403724757,2018,40.38709677419355,5.181231638358062,5.229999999999996,21.0
111,2018-04-01,807664.7426867572,2018,48.43333333333333,7.775662000127698,6.369999999999999,23.0
112,2018-05-01,914474.3441973563,2018,67.87096774193549,7.126204859455326,3.6099999999999994,23.0
113,2018-06-01,1153006.6231287555,2018,73.06666666666666,6.186377321810895,4.170000000000001,23.0
114,2018-07-01,1276496.0285349952,2018,79.06451612903226,4.049160274023639,7.25,21.0
115,2018-08-01,1316649.6251799935,2018,79.58064516129032,4.602710748739174,6.869999999999999,26.0
116,2018-09-01,1072038.446738115,2018,71.8,7.975825543694415,7.33,23.0
117,2018-10-01,984122.838518516,2018,57.96774193548387,11.37096850453746,3.439999999999999,17.0
118,2018-11-01,955696.6785527564,2018,44.266666666666666,10.418595223770895,7.76,18.0
119,2018-12-01,771724.2218356372,2018,39.774193548387096,6.950825741926183,7.049999999999999,18.0
120,2019-01-01,946870.2490583566,2019,32.38709677419355,8.931880799901883,4.569999999999999,20.0
121,2019-02-01,790956.773396277,2019,36.5,7.885758377301778,4.019999999999999,22.0
122,2019-03-01,763339.7098665568,2019,41.29032258064516,9.015518281966546,4.069999999999999,13.0
123,2019-04-01,825732.9966101167,2019,55.96666666666667,8.079276740134489,4.019999999999999,24.0
124,2019-05-01,792009.2976067576,2019,63.46666666666667,7.185873754053532,5.58,20.0
125,2019-06-01,1152400.3206987947,2019,68.43333333333334,19.34742847268719,4.78,15.0
126,2019-07-01,1183601.035613114,2019,81.09677419354838,4.117886502480591,7.139999999999999,17.0
127,2019-08-01,1222977.264865714,2019,77.09677419354838,4.777411005343638,4.159999999999998,17.0
128,2019-09-01,1199419.983423354,2019,71.3,4.8647217023566025,0.7500000000000002,11.0
129,2019-10-01,1117825.9677267948,2019,60.41935483870968,6.627086808688451,7.62,21.0
130,2019-11-01,829419.6113829974,2019,40.13793103448276,13.27383069354089,2.02,14.0
131,2019-12-01,873384.2534868366,2019,37.645161290322584,6.916397844238353,6.229999999999999,16.0
132,2020-01-01,956806.9592289962,2020,39.16129032258065,7.6293152781165885,2.7700000000000005,16.0
133,2020-02-01,708899.5447279974,2020,38.793103448275865,6.847206470758859,2.839999999999999,18.0
134,2020-03-01,803317.7813419971,2020,46.74193548387097,6.465125633919699,3.9099999999999997,18.0
135,2020-04-01,913181.8066149974,2020,48.93333333333333,5.381535054260065,4.85,21.0
136,2020-05-01,935727.301459997,2020,59.935483870967744,8.686907711688773,2.69,16.0
137,2020-06-01,1201904.4786120355,2020,73.13333333333334,5.757953020692832,2.6100000000000003,10.0
138,2020-07-01,1628679.426178073,2020,80.48387096774194,3.3652237938650806,7.4700000000000015,16.0
139,2020-08-01,1350522.9579889944,2020,76.87096774193549,4.616939357654379,4.680000000000001,16.0
140,2020-09-01,1047882.2154430768,2020,67.76666666666667,6.921646870673101,2.8600000000000003,11.0
141,2020-10-01,1351015.0518259946,2020,56.70967741935484,6.967512939287573,5.759999999999999,18.0
142,2020-11-01,966689.8665413167,2020,49.7,7.520775822049029,4.429999999999999,12.0
143,2020-12-01,992997.4430022766,2020,36.903225806451616,6.734759776511099,4.009999999999999,18.0
144,2021-01-01,793635.907752757,2021,33.25806451612903,5.464843650922895,2.669999999999999,12.0
145,2021-02-01,774153.4181279977,2021,31.964285714285715,6.36822053671454,5.009999999999999,19.0
146,2021-03-01,831616.5672269968,2021,44.25806451612903,10.507038091791882,3.3299999999999996,15.0
147,2021-04-01,936923.1141509968,2021,53.241379310344826,7.438582681887115,2.599999999999999,16.0
148,2021-05-01,881609.163130997,2021,61.45161290322581,8.131579222092187,5.56,14.0
149,2021-06-01,1182188.4935479944,2021,73.63333333333334,7.752567242570338,4.199999999999998,16.0
150,2021-07-01,1287837.5322359935,2021,76.36666666666666,5.182752120276437,9.019999999999998,20.0
151,2021-08-01,1129911.5112989955,2021,77.12903225806451,4.609713913638395,7.47,17.0
152,2021-09-01,1252218.172784994,2021,69.53333333333333,4.804691194162674,9.379999999999997,14.0
153,2021-10-01,1034031.1316429949,2021,62.03225806451613,5.997131931002473,5.829999999999999,15.0
154,2021-11-01,840258.3024629971,2021,44.5,6.750478910213804,1.4700000000000004,16.0
155,2021-12-01,965670.8271239956,2021,42.29032258064516,6.900210375474538,1.2400000000000002,16.0
156,2022-01-01,975325.283224996,2022,28.838709677419356,8.509198059329867,3.92,14.0
157,2022-02-01,686451.2056039971,2022,35.642857142857146,8.7483936545169,3.51,12.0
158,2022-03-01,905996.5155887952,2022,43.74193548387097,9.763768882747014,2.3499999999999996,17.0
159,2022-04-01,708662.4774957974,2022,51.766666666666666,6.526779494874377,5.199999999999998,17.0
160,2022-05-01,765574.3340379971,2022,65.0,7.844318538492259,5.639999999999999,21.0
161,2022-06-01,1173307.509837995,2022,71.33333333333333,4.482404937417915,4.069999999999999,18.0
162,2022-07-01,1368574.7561979922,2022,77.03225806451613,14.704837913575114,5.159999999999998,14.0
163,2022-08-01,1311920.0431679927,2022,79.19354838709677,4.534455901492549,1.86,11.0
164,2022-09-01,1604682.839411593,2022,44.46666666666667,34.73915939400806,2.62,8.0
165,2022-10-01,760745.1719999975,2022,55.74193548387097,5.5855035101918595,5.509999999999999,17.0
sayanpatra commented 1 year ago

Hi @bernardoct , thanks for reaching out. I agree that the exception message can be a lot cleaner, we will take this into consideration for our next release. Having said that, this is a deliberate design choice. Let me explain.

You have pointed out correctly that the exception comes from the fact that the fut_df is empty. The underlying reason is that you need to provide future regressor values if using regressors. Take a look at the first diagram here for Greykite behavior: https://github.com/linkedin/greykite/blob/master/docs/pages/stepbystep/0300_input.rst

Internally we compute train_end_date as the last date with a non-null value in the y column. The regressor needs to be available for at least train_end_date + forecast_horizon.

In our library, there are 2 ways to facilitate this:

  1. Specify train_end_date in MetadataParam. All y values after this are ignored after this date.
  2. Manually mark the last few values in column y as NA.

We do not allow training and forecasting on the same data as part of the Forecaster. However,

  1. You can extract the backtest from the Forecaster result.
  2. You can extract the trained_model from the Forecaster result, and predict on an arbitrary data frame, provided it has the appropriate columns.