Open PyBrown opened 11 months ago
Thanks for reporting, could you kindly add dummy data in your example, to provoke the bug?
The get_expected_pred_idx
seems to work, so possibly it is something with the reducer - fyi @benHeid
@fkiraly Thanks for your reply. I have attached a dummy training data to provoke the bug. dummy_data.xlsx
Code:
regressor = HistGradientBoostingRegressor(random_state=1234)
fh = ForecastingHorizon(np.arange(1, 200), is_relative=True)
forecaster = make_reduction(
estimator=regressor,
window_length=10,
transformers=None,
strategy="recursive",
pooling="global",
)
forecaster_fit = forecaster.fit(y_hat)
y_pred = forecaster_fit.predict(fh=fh)
list(y_pred.groupby(["field_abbr_num", "reservoir_num", "string_num"]))
Output:
[((0, 19, 8),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 19 8 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 24, 3),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 24 3 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 24, 7),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 24 7 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 47, 1),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 47 1 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 78, 0),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 78 0 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 78, 4),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 78 4 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 78, 8),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 78 8 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 0),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 0 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 2),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 2 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 5),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 5 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 6),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 6 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 7),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 7 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 14),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 14 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 82, 16),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 82 16 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 92, 10),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 92 10 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 93, 9),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 93 9 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 93, 12),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 93 12 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 93, 13),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 93 13 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 93, 15),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 93 15 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 93, 17),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 93 17 2020-02 801.262014
2020-03 742.962152
2020-04 810.083172
2020-05 776.543897
2020-06 786.970838
... ...
2036-04 670.084600
2036-05 670.084600
2036-06 670.084600
2036-07 670.084600
2036-08 670.084600
[199 rows x 1 columns]),
((0, 93, 18),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 93 18 2020-02 884.892978
2020-03 809.800789
2020-04 757.269487
2020-05 746.294392
2020-06 763.567299
... ...
2036-04 641.225662
2036-05 641.225662
2036-06 641.225662
2036-07 641.225662
2036-08 641.225662
[199 rows x 1 columns]),
((0, 94, 9),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 94 9 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((0, 94, 11),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
0 94 11 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 68, 36),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 68 36 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 72, 26),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 72 26 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 72, 30),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 72 30 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 94, 27),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 94 27 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 95, 29),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 95 29 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 95, 41),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 95 41 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 26),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 26 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 32),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 32 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 34),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 34 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 35),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 35 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 39),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 39 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 41),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 41 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 98, 43),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 98 43 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 99, 31),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 99 31 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 99, 37),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 99 37 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 99, 40),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 99 40 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 100, 44),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 100 44 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 101, 38),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 101 38 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 103, 28),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 103 28 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 103, 33),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 103 33 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 103, 35),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 103 35 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns]),
((4, 103, 42),
oil_pd[bbl/d]
field_abbr_num reservoir_num string_num m_date
4 103 42 2020-02 51.424132
2020-03 54.017321
2020-04 54.017321
2020-05 54.017321
2020-06 54.017321
... ...
2036-04 180.965867
2036-05 180.965867
2036-06 180.965867
2036-07 180.965867
2036-08 180.965867
[199 rows x 1 columns])]
I am forecasting hierarchical oil production data with three hierarchy levels: "field," "reservoir," and "string." Each series in the dataset has different initial and ending time periods. I am attempting to develop a global model to forecast all the time series at once. I have partitioned the series into training and testing sets using the temporal train-test split technique, so my training set contains 75% of each series while the rest is reserved for testing. I would like to be able to specify a ForecastingHorizon that matches the time indices for each series on the testing set so as to evaluate the performance of the global model's prediction against the test set. However, specifying a relative ForecastingHorizon using a NumPy range object yields a forecast that begins at the same time period for all series, rather than the time period from the end of the individual series.
My intention was to forecast each series to an arbitrary relative time step (i.e., 1 to 200 steps ahead) exceeding the time range of all the individual time series on the test set, and then index the forecast at the time indices on the test set. However, my current approach is not producing the desired outcome. Kindly advise on how I can achieve this, or better yet, some other strategy for obtaining the forecast at the corresponding timesteps on the test series.
System: python: 3.11.5
Python dependencies: sktime: 0.24.1 sklearn: 1.3.2 skbase: 0.6.1