Closed akbaramed closed 2 years ago
Hi @akbaramed and thanks for writing. You are correct about the scaling. The example notebooks on forecasting models are primarily meant to show how you can use our models. We sometimes use shortcuts which ignore best-practices on data processing.
@hrzn we should maybe consider being more concise there as it's not the first time someone mentions this.
Very grateful for you comment, closing issue.
Hello,
Wanting to get some clarification on the example page of NBeats (https://unit8co.github.io/darts/examples/08-NBEATS-examples.html)
I am using stocks data and following steps from the example for better understanding.
Observation: Cell 5, you have scaled first by imputing missing values and then in Cell 6 split the series into train and val after a certain date. Questions wont this introduce Data Leakage (shouldn't we split first and then scale).
Can the community please let me know their views.
My work around ########### series = TimeSeries.fromdataframe(df, 'Date','Adj Close') scaler = Scaler() train = scaler.fit_transform(series[:-15]) val = scaler.transform(series[-15:]) ############
Then in cell 9 , where "series" variable is provided, I apply below logic to make that happen ############# series_ = pd.concat([train.pd_series().reset_index(), val.pd_series().resetindex()]) series.columns = ['Date','Adj Close'] series_ = TimeSeries.fromdataframe(series,'Date','Adj Close') ############
FYI there is a difference in the MAPE values of the 2 approaches.
Any comments suggestions/explanations are welcome
Thanks