timestocome / Test-stock-prediction-algorithms

Use deep learning, genetic programming and other methods to predict stock and market movements
MIT License
419 stars 152 forks source link

Maybe there is another way out #1

Closed doncat99 closed 7 years ago

doncat99 commented 7 years ago

LSTM may not be suitable for predicting trend in a short stock market period.

I am attempting to classify the shape of a stock(last N days OHLCV data, 5 days for instance) into M kinds of classification (7 output classes).

Such as drop -8% of stock price as a class, drop [-8%, -4%) as another class, etc.

As you can see, the training accuracy is high, and definitely overfitting: Accuracy for training classes: (7 classes) [ 82.609 84.146 82.596 87.204 84.164 77.381 80. ]

Where as the predict accuracy is low: Accuracy for predict following 20 days [ nan 0. 20. 57.143 50. 0. nan]

For now, I am wondering if change the classification model to DBN could produce a more reasonable data outcome. Hope it helps.

Below is part of the out fragment:

Epoch 1996/2000 81581/81581 [==============================] - 58s - loss: 0.0157 - acc: 0.9954 Epoch 1997/2000 81581/81581 [==============================] - 58s - loss: 0.0139 - acc: 0.9957 Epoch 1998/2000 1650/81581 [..............................] - ETA: 57s - loss: 0.0173 - acc: 0.9958 81581/81581 [==============================] - 58s - loss: 0.0143 - acc: 0.9954 Epoch 1999/2000 81581/81581 [==============================] - 58s - loss: 0.0153 - acc: 0.9954 Epoch 2000/2000 5050/81581 [>.............................] - ETA: 55s - loss: 0.0176 - acc: 0.9943 81581/81581 [==============================] - 58s - loss: 0.0158 - acc: 0.9951 save LSTM model... ############## validation on test data ############## scaled data mse: 0.130540770636 load LSTM model... ############## validation on train data ############## scaled data mse: 0.0391746699673 ############## validation on valid data ############## scaled data mse: 0.176731004083 ############## validation on lately data ############## scaled data mse: nan

---------- AMD ----------

classification counter: [23, 82, 339, 422, 341, 84, 30] classification possibility: [ 1.741 6.207 25.662 31.945 25.814 6.359 2.271] classification train predict: [ 82.609 84.146 82.596 87.204 84.164 77.381 80. ] classification valid predict: [ nan 0. 20. 57.143 50. 0. nan]


                   close     volume      predict_profit  a_+1_d  p_+1_d      

Date 2017-03-15 13.98 54885200 -2.360515 -1.0 -2.0 2017-03-16 13.65 44129100 -1.172161 -1.0 -2.0 2017-03-17 13.49 218636000 6.745738 2.0 1.0
2017-03-20 14.40 90863900 -4.027778 -2.0 0.0 2017-03-21 13.82 72191500 2.026049 1.0 1.0 2017-03-22 14.10 61089400 -2.198582 -1.0 -1.0 2017-03-23 13.79 44144100 -0.652647 0.0 0.0 2017-03-24 13.70 49903700 0.000000 0.0 0.0 2017-03-27 13.70 42537800 -0.072993 0.0 2.0 2017-03-28 13.69 37005800 0.146092 0.0 0.0 2017-03-29 13.71 37777200 2.479942 1.0 -1.0
2017-03-30 14.05 43814100 3.558719 1.0 -1.0
2017-03-31 14.55 84362600 0.618557 0.0 0.0
2017-04-03 14.64 48299200 -3.278689 -1.0 1.0
2017-04-04 14.16 58217200 0.070621 0.0 -2.0 2017-04-05 14.17 58384000 -6.351447 -2.0 2.0 2017-04-06 13.27 139038000 1.883949 1.0 1.0
2017-04-07 13.52 70297900 -3.106509 -1.0 1.0 2017-04-10 13.10 46924500 0.000000 0.0 1.0
2017-04-11 13.10 59786900 -2.595420 -1.0 0.0 2017-04-12 12.76 37087100 NaN NaN 0.0

timestocome commented 7 years ago

I like that. Classification isn't as much of a black box as the other networks. It'd be nice to see how they arrive at the predictions.

That's an interesting idea. idk? I met someone using SOM to do fluid mechanics a couple of weeks ago. That isn't too far off from using a DBN to do stocks.

I'm going to try a different RNN architecture, an evolving system and also plain old calculus and FFTs just to see what if anything useful they predict? This is going to be a summer long project for me. Keep in touch I'd love to hear what works and doesn't and I'll keep posting both failures and successes here.

Edit: I have not had a lot of success with LSTMs for series predictions. I also tried them with 'Alice in Wonderland'. The Markov Chains were more accurate and faster than the LSTMs and GRUs

Edit: Closed - this isn't an issue, this is a repository to try lots of things, some will work, some will not.