Possible Data Leakage in both CPD?

kieranjwood / trading-momentum-transformer

This code accompanies the the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture (https://arxiv.org/pdf/2112.08534.pdf).

https://kieranjwood.github.io/publication/momentum-transformer/

MIT License

465 stars 192 forks source link

Possible Data Leakage in both CPD? #9

Closed nova-land closed 1 year ago

nova-land commented 1 year ago

Within the code in mon_trans/changepoint_detection.py at function changepoint_loc_and_score.

The script used the StandardScaler to fit transform the entire timeseries data, and I could not find anywhere else has a train/test split before generating CPD data. Is this a possible Data Leakage that improves the prediction result with CPD feature ?

 time_series_data[["Y"]] = StandardScaler().fit(Y_data).transform(Y_data)

Best Regards, Chris

kieranjwood commented 1 year ago

in this context, Y_data contains past day returns so it has already been observed, meaning there is no data leakage

nova-land commented 1 year ago

Confirmed with change point location normalisation only look at each window data.