robertmartin8 / MachineLearningStocks

Using python and scikit-learn to make stock predictions
MIT License
1.74k stars 506 forks source link

Train - test split (allready seen samples) #38

Closed illUkc closed 3 years ago

illUkc commented 3 years ago

Hello,

First of all great work Robert.

I find one big mistake ( everyone do that ) in backtesting.py -> row 40 - u are using shuffle = True ( by default is true in train_test_split ) and when u doing i+1 or i+x targets data is already seen when doing learning. Because of that u get always different result when running backtesting.py. If u change shuffle = False u will get 45-50% less of trades and Accuracy score will drop to 0.6/0.65 max.

Best

robertmartin8 commented 3 years ago

@illUkc this is the mistake that's being referred to in the readme

Tom-Ryder commented 3 years ago

There's another important caveat btw - the data is biased to companies who outperform the market. That is, deciding to buy all of the shares in a 20% test split and you will outperform the market by ~4%.