huseinzol05 / Stock-Prediction-Models

Gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations
Apache License 2.0
8.11k stars 2.84k forks source link

⚠️ Data Leakage: Must not use test data when fitting MinMaxScaler() #126

Open shure-dev opened 1 year ago

shure-dev commented 1 year ago

Probably, I found a serious error.

If I'm correct, we cannot use any information from test data when preprocessing data.

However, your code applied fit_transform() to train and test data.

This means train data can contain information from test data and effects accuracy.

Please correct me if my idea is wrong, thank you.

shure-dev commented 1 year ago

This answer seems working well for this issue.

https://stackoverflow.com/questions/70923839/sklearn-preprocessing-with-a-rolling-window

shure-dev commented 1 year ago

Probably, also we have to care about stationarity, when we treat time series data