Input for OWA Calculation - AttributeError

leschaf commented 2 years ago

Hi,

I was searching for OWA implementations to use the measure in one of my projects. I'm starting with calculation for a single time series. The ESRNN lib provides this function:

final_owa, final_mase, final_smape = evaluate_prediction_owa(y_hat_df, y_train_df, X_test_df, y_test_df, naive2_seasonality=1)

When I look at the code (https://github.com/kdgutier/esrnn_torch/blob/master/ESRNN/utils_evaluation.py#L370-L400), X_test_df is not used at all in the function - is that correct?

Also, I'm not sure about the input format for using the function.

Here is my input for y_train_df, which contains the historical target values:

unique_id | ds | y -- | -- | -- 00010_000030_BS824_2018-01-01 | 2017-12-01 | 1497400.0 00010_000030_BS824_2018-01-01 | 2017-11-01 | 1707420.0 00010_000030_BS824_2018-01-01 | 2017-10-01 | 1989485.0 00010_000030_BS824_2018-01-01 | 2017-09-01 | 1697800.0 00010_000030_BS824_2018-01-01 | 2017-08-01 | 1574400.0 00010_000030_BS824_2018-01-01 | 2017-07-01 | 1260556.0 00010_000030_BS824_2018-01-01 | 2017-06-01 | 1319198.0 00010_000030_BS824_2018-01-01 | 2017-05-01 | 1592793.0 00010_000030_BS824_2018-01-01 | 2017-04-01 | 1575775.0 00010_000030_BS824_2018-01-01 | 2017-03-01 | 1808200.0 00010_000030_BS824_2018-01-01 | 2017-02-01 | 1365519.0 00010_000030_BS824_2018-01-01 | 2017-01-01 | 1904000.0 00010_000030_BS824_2018-01-01 | 2016-12-01 | 1713520.0 00010_000030_BS824_2018-01-01 | 2016-11-01 | 1908281.0 00010_000030_BS824_2018-01-01 | 2016-10-01 | 1737900.0 00010_000030_BS824_2018-01-01 | 2016-09-01 | 2005440.0 00010_000030_BS824_2018-01-01 | 2016-08-01 | 1683500.0 00010_000030_BS824_2018-01-01 | 2016-07-01 | 1179682.0 00010_000030_BS824_2018-01-01 | 2016-06-01 | 1834500.0 00010_000030_BS824_2018-01-01 | 2016-05-01 | 1949500.0 00010_000030_BS824_2018-01-01 | 2016-04-01 | 1811450.0 00010_000030_BS824_2018-01-01 | 2016-03-01 | 2001200.0 00010_000030_BS824_2018-01-01 | 2016-02-01 | 1273837.0

Here is the y_hat_df, which contains my model predictions for the future values:

unique_id | ds | y_hat -- | -- | -- 00010_000030_BS824_2018-01-01 | 2018-01-01 | 1403634.0 00010_000030_BS824_2018-01-01 | 2018-02-01 | 1543464.0 00010_000030_BS824_2018-01-01 | 2018-03-01 | 1751357.0 00010_000030_BS824_2018-01-01 | 2018-04-01 | 1874214.0 00010_000030_BS824_2018-01-01 | 2018-05-01 | 1810092.0 00010_000030_BS824_2018-01-01 | 2018-06-01 | 1811571.0 00010_000030_BS824_2018-01-01 | 2018-07-01 | 1828860.0 00010_000030_BS824_2018-01-01 | 2018-08-01 | 1708163.0 00010_000030_BS824_2018-01-01 | 2018-09-01 | 1672521.0 00010_000030_BS824_2018-01-01 | 2018-10-01 | 1809456.0 00010_000030_BS824_2018-01-01 | 2018-11-01 | 1870753.0 00010_000030_BS824_2018-01-01 | 2018-12-01 | 1596886.0 00010_000030_BS824_2018-01-01 | 2019-01-01 | 1253630.0 00010_000030_BS824_2018-01-01 | 2019-02-01 | 1618861.0 00010_000030_BS824_2018-01-01 | 2019-03-01 | 1466855.0 00010_000030_BS824_2018-01-01 | 2019-04-01 | 1677125.0 00010_000030_BS824_2018-01-01 | 2019-05-01 | 1887335.0 00010_000030_BS824_2018-01-01 | 2019-06-01 | 1576052.0

And finally, here is my y_test_df, which contains the true future values with the same dates as in y_hat_df:

unique_id	ds	y
00010_000030_BS824_2018-01-01	2018-01-01	2237400.0
00010_000030_BS824_2018-01-01	2018-02-01	1967330.0
00010_000030_BS824_2018-01-01	2018-03-01	1886660.0
00010_000030_BS824_2018-01-01	2018-04-01	1818600.0
00010_000030_BS824_2018-01-01	2018-05-01	2060476.0
00010_000030_BS824_2018-01-01	2018-06-01	1928000.0
00010_000030_BS824_2018-01-01	2018-07-01	1506416.0
00010_000030_BS824_2018-01-01	2018-08-01	1705200.0
00010_000030_BS824_2018-01-01	2018-09-01	1602600.0
00010_000030_BS824_2018-01-01	2018-10-01	2002980.0
00010_000030_BS824_2018-01-01	2018-11-01	1829730.0
00010_000030_BS824_2018-01-01	2018-12-01	1385800.0
00010_000030_BS824_2018-01-01	2019-01-01	1923362.0
00010_000030_BS824_2018-01-01	2019-02-01	1849415.0
00010_000030_BS824_2018-01-01	2019-03-01	1921600.0
00010_000030_BS824_2018-01-01	2019-04-01	2143900.0
00010_000030_BS824_2018-01-01	2019-05-01	2014900.0
00010_000030_BS824_2018-01-01	2019-06-01	1832100.0

Upon calling evaluate_prediction_owa I get, on this line: y_hat_id = y_hat_panel[top_row:bottom_row].y_hat.to_numpy() the following error - any idea why that happens? What am I missing?

AttributeError Traceback (most recent call last) ~/projects/semco/semicon-forecast/src/a4_benchmark.py in ----> 1 evaluate_prediction_owa(y_hat_df, y_train_df, 2 None, y_test_df, 3 naive2_seasonality=12) 4

~/miniconda3/envs/semicon/lib/python3.8/site-packages/ESRNN/utils_evaluation.py in evaluate_prediction_owa(y_hat_df, y_train_df, X_test_df, y_test_df, naive2_seasonality) 390 y_insample = y_train_df.filter(['unique_id', 'ds', 'y']) 391 --> 392 model_owa, model_mase, model_smape = owa(y_panel, y_hat_panel, 393 y_naive2_panel, y_insample, 394 seasonality=naive2_seasonality)

~/miniconda3/envs/semicon/lib/python3.8/site-packages/ESRNN/utils_evaluation.py in owa(y_panel, y_hat_panel, y_naive2_panel, y_insample, seasonality) 350 total_mase = evaluate_panel(y_panel, y_hat_panel, mase, 351 y_insample, seasonality) --> 352 total_mase_naive2 = evaluate_panel(y_panel, y_naive2_panel, mase, 353 y_insample, seasonality) 354 total_smape = evaluate_panel(y_panel, y_hat_panel, smape)

~/miniconda3/envs/semicon/lib/python3.8/site-packages/ESRNN/utils_evaluation.py in evaluate_panel(y_panel, y_hat_panel, metric, y_insample, seasonality) 316 top_row = np.asscalar(y_hat_panel['unique_id'].searchsorted(u_id, 'left')) 317 bottom_row = np.asscalar(y_hat_panel['unique_id'].searchsorted(u_id, 'right')) --> 318 y_hat_id = y_hat_panel[top_row:bottom_row].y_hat.to_numpy() 319 assert len(y_id)==len(y_hat_id) 320

~/miniconda3/envs/semicon/lib/python3.8/site-packages/pandas/core/generic.py in getattr(self, name) 5463 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5464 return self[name] -> 5465 return object.getattribute(self, name) 5466 5467 def setattr(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'y_hat'

AzulGarza commented 2 years ago

Hi, the argument y_test_df is a pandas df panel with columns unique_id, ds, y, y_hat_naive2. So your y_test_df must include the naive 2 predictions to calculate the owa.

leschaf commented 2 years ago

Thank you - that helped!

Any comment on the use of X_test_df?

kdgutier / esrnn_torch

Input for OWA Calculation - AttributeError #39