llSourcell / How-to-Predict-Stock-Prices-Easily-Demo

How to Predict Stock Prices Easily - Intro to Deep Learning #7 by Siraj Raval on Youtube
768 stars 589 forks source link

Windows 7 Install & Python 3.5 Upgrade - An Exciting Story [Solved] #3

Closed Steviey closed 7 years ago

Steviey commented 7 years ago

Thank you for the example. I get the following error:

Using Theano backend. WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string. File "C:\BrainPorn\How-to-Predict-Stock-Prices-Easily-Demo-master\lstm.py", line 17 print 'yo' ^ SyntaxError: Missing parentheses in call to 'print'

I'm more then confused.

I have Anaconda with a TensorFlow Env and a Keras lib. But I can't start a notebook on the TensorFlow shell, so I start the notebook on the Anaconda shell, where the above output appears in the browser. Who can provide a descent "how to start the example" (cross-platform or Win7)?

Update Day Two:

The environment of the example is not install able on Windows, so trying other ways.

Install native on Windows: Keras install failed, probably because of Scipy Install Anaconda way: Keras install failed, Notebook showing up in Browser (incl. failed dependency Keras) Install Docker way: Jupyter install failed...

Trying the docker way. Would be nice to have a container, so more people could follow.

I got a functional container with TensorFlow + Keras from here:

https://blog.thoughtram.io/machine-learning/2016/09/23/beginning-ml-with-keras-and-tensorflow.html

Now I have a running TensorFlow with Keras but no jupyter notebook option.

Unfortunately 'pip install jupyter' in the environment leads to errors too.

So still no success.

Does someone has a Docker container with TensorFlow + Keras + Jupitor Notebook?

Update: Found an all in one docker image here:

https://github.com/floydhub/dl-docker (Python2 and iTouch Kernel)

docker run -it -p 192.168.99.100:8888:8888 -p 192.168.99.100:6006:6006 -v /sharedfolder:/root/sharedfolder floydhub/dl-docker:cpu jupyter notebook

...does not show the example (empty notebook)...sadly giving up.

Update:

Turns out Docker on Win needs a special syntax....

docker run -it -p 192.168.99.100:8888:8888 -p 192.168.99.100:6006:6006 -v //c/Users//sharedfolder:/root/sharedfolder floydhub/dl-docker:cpu jupyter notebook

...finally does the trick....but after a while in the browser I get this:

The kernel has died, and the automatic restart has failed. It is possible the kernel cannot be restarted. If you are not able to restart the kernel, you will still be able to save the notebook, but running code will no longer work until the notebook is reopened.

What a disaster after such a long journey!!!

Which Kernel do we need?

GuilhermeCaeiro commented 7 years ago

This reply might be unuseful but, given the problems you reported:

  1. The error you received about the "print" statement is caused by differences between python 2 and 3 syntaxes. The code here uses python 2 (where the print syntax is 'print "string"') and you are probably running the code using python 3 (where the print should be used like a function, like 'print("string")'). Regarding the warning, It just looks like you don't have gcc installed. If you install It, the warning will probably disappear.

  2. The problems you are having with the libraries are probably being caused by a confusion (from your part) about the python environments. In order to solve that, what I would do is:

    • create a conda environment with python 2.7 or 3.6 (pay attention that, on windows, Tensorflow won't on all python versions; please, take a look on its installation page and choose the proper python version) FROM HERE, DO EVERYTHING WITHIN THAT ENVIRONMENT:
    • install Tensorflow, jupyter and the other dependencies using conda install, like "conda install -c conda-forge [package_name]"
    • installing keras through conda would require python 3.4 or 3.5 (I don't remember exactly what is the case). For this reason, install It manually by donwloading its files, extracting them somewere and then installing keras by executing "python setup.py install" (the step-by-step is probably available on keras website/repository).

In my case, those were the steps I took to have everything up and running on Ubuntu. In your case, you are using Windows but, given the fact that those steps aren't related to a specific system (the exception is the observation about the Tensorflow x Python compatibility on Windows), It should work fine.

Steviey commented 7 years ago

@GuilhermeCaeiro Thany you for your coment. First of all, I now have a running environment on Docker-base which is used by many others, see my last comment. Except that the notebook is crashing down- does not work correctly.

The install processes and configurations are obviously different from Ubuntu and Mac. Believe me I'm doing this since several days now. On Windows all of this is a pain in the ass.

The main reason for this seems to be scipy which is a dependency of Keras and many other libraries. It has no descent windows support. This is described on there own website in parts here. https://scipy.org/install.html

This seems to be not fixed using 3rd-party bundles.

Since I have a bunch of 3rd-party bundles on board now (several gigs), I will try it a last time with Anaconda/conda and report the errors back.

GuilhermeCaeiro commented 7 years ago

I must admit that I never tried using Keras on Windows. But I already used the the jupyter notebook and, probably, scipy (I know I have it instaled on windows, but don't remember when it It was the last time I used It "directly" on Windows) without problems. If I have some time tomorrow, I Will give it a try. Regarding docker, unfortunately, I never used It. I like "having control" about what I'm doing (yes, I'm hard-headed).

Steviey commented 7 years ago

Update:

Hell I'm to old for this shit. My misconception was to use pip install from within Anaconda. That cost me a complete day and a Docker-expedition. This does the trick in parts...

Protocoll: $ conda create -n MyLastTest python=3.5 (create a virtual environment) $ activate MyLastTest $ conda install -c conda-forge tensorflow $ conda install -c conda-forge keras $ conda install -c conda-forge jupyter $ jupyter notebook ...so far so good... Fixed print statements for Python 3 Missing includes:


ImportError                               Traceback (most recent call last)
<ipython-input-3-4267c99e4458> in <module>()
      2 from keras.layers.recurrent import LSTM
      3 from keras.models import Sequential
----> 4 import lstm, time #helper libraries

C:\Users\BrainPorn\0\lstm.py in <module>()
      6 from keras.layers.recurrent import LSTM
      7 from keras.models import Sequential
----> 8 import matplotlib.pyplot as plt
      9 
     10 
ImportError: No module named 'matplotlib'

This does not work... $ conda install -c conda-forge lstm $ conda install -c conda-forge time $ conda install -c conda-forge matplotlib

...continuing with a Python 2.7 branch...

Steviey commented 7 years ago

ARRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRGH.........................................

error1 hang

GuilhermeCaeiro commented 7 years ago

According to Tensorflow's documentation, on Windows, It only works on python 3.5. Use the MyLastTest (python 3.5) environment. The "lstm" library comes with the code in this repository. You don't have to install it. "time", if I'm not wrong, already comes with python. To install matplotlib, you can try: https://anaconda.org/conda-forge/matplotlib Unfortunately, the code here, probably, won't work on Windows if you use python 2.7 (like in "MyLastTestB"). At least, not if you use Tensorflow as Keras' backend.

Obs.: Don't give up. For me, It is always a pain in the ass to setup a proper environment too. :O

Steviey commented 7 years ago

@GuilhermeCaeiro Thank you so much for your guidance and useful advice's. I successfully installed matplotlib using a dedicated version number. The side effects/include errors then disappeared. So now I get this:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-a6ba6dd61f8f> in <module>()
      1 #Step 2 Build Model
----> 2 model = Sequential()
      3 
      4 model.add(LSTM(
      5     input_dim=1,

NameError: name 'Sequential' is not defined

Any ideas?

GuilhermeCaeiro commented 7 years ago

Did you include "Sequential" at the beginning of your code? from keras.models import Sequential

Steviey commented 7 years ago

I would say yes, it's in the original lstm.py... from keras.models import Sequential. Would you think, the import lstm goes wrong (may windows-path-problems)?

GuilhermeCaeiro commented 7 years ago

Try: import keras.model import .lstm

I don't know if it is a Windows related problem and, unfortunately, I can't Test right now, because I'm on my phone right now. Anyway, be sure to execute the jupyter notebook from within the conda environment where all the dependencies are installed ("MyLastTest").

Steviey commented 7 years ago

Hm the error comes and goes. Very strange, currently it's gone.

I noticed that in the video Siray has a remark 'using TensorFlow backend' on the screen/in the browser. While I have now 'Using Theano backend', after I tried to install gcc this way: http://stackoverflow.com/questions/33687103/how-to-install-theano-on-anaconda-python-2-7-x64-on-windows ...and a massive amount of red lines.

Do you know an easy way to change the backend to TensorFlow? Is there a flag available for jupytor notebook or so?

I guess I saw a responding one-liner in one of Siray's other examples in another context- but I'm not sure if this is could help.

Should/could I uninstall Theano? But then Keras will go away too.

I'm so tired now...my live has no more sense...https://www.youtube.com/watch?v=y7EpSirtf_E

GuilhermeCaeiro commented 7 years ago

Sorry for taking too long to answer...

Today I had some time to test the code on Windows and, following your recipe (the list of things you installed), I got everything installed without errors. On the other hand, when I tried running the code, everything became a mess. Sometimes it would stuck and keep running forever during the training, other times it would execute the training "cell" (notebook cell) but wouldn't train anything and then get stuck at the predictions and, lastly, after I reset the notebook, it started raising errors in the "model" class, pointing that something was wrong with the Keras x Tensorflow compatibility. After that, I tried the same thing on Linux (using Python 3.5) and, for my surprise, everything worked fine. Trying to understand what was happening, I took a look on Keras issues and read that, recently, there was a change on Tensorflow that "broke" the support provided by older Keras' versions. Taking a look on why everything worked in one system, but didn't in the other, I realized that on Linux, conda installed Keras 2.0.2 (released this week), and, on Windows, it installed Keras 1.0.7 (more than 6 months old). Given this situations, what I had to do to solve it was:

  1. Remove Keras ("conda remove keras")
  2. Download the most recent version (2.0.2) from its repository
  3. Install it manually running "python setup.py install" (from within the conda environment and the Keras folder)

You still have to make a few modification (change xrange to range and cast a value to int) in the code to make it work with Python 3, but those things are straightforward.

Regarding the backend, the way (s) it can be changed can be found here: https://keras.io/backend/

Now, I think you can be happy and try getting rich. :)

Now, talking about a few curious things:

  1. The code ran about 3 times faster on Windows if compared to Ubuntu running on a virtual machine. I wonder how it would perform on Ubuntu if it was the host...
  2. For the same amount of epochs (10, just a test), the results on Windows were much more precise than on Ubuntu (I really don't know why).
Steviey commented 7 years ago

@GuilhermeCaeiro

Very very interesting. I see hope and will give it a try.

(from within the conda environment and the Keras folder) Can you describe where the Keras - folder sits- on Windows? I have so many of them.

(change xrange to range and cast a value to int) Do you have script-name/line-numbers or better an example-code snipped? I'm totally new to Python and TensorFlow.

Your last point would suggest to better use a dedicated OS for such calculations. Maybe it's because C++ has its roots on Windows. Would be interesting to see what quantitative analysts have to say to this. At the End calculations on Docker images are having another, different bias. Strange.

If I get it running and start to understand that shit completely, I plan to make result comparisons between TensorFlow and Rapidminer.

GuilhermeCaeiro commented 7 years ago

When I wrote "keras directory", I was talking about the folder you'll extract from the compacted file you'll download from keras repository.

Regarding the errors, when you run the code (on jupyter), a few errors will show up in the lstm file, complaining about the things I mentioned before. When the error complains about "xrange", change It to "range", when the complaining is about passing a float to a function that expects integer, cast the value to an integer. To change the xrange, just search for it through the file.

Yes, if you want to use ML/AI professionaly, don't use VMs (unless that "VM" is an instance in the cloud - or somewhere else, properly set to deal with that task), because things might run extremely slow. Regarding the code we have been talking about, if I was using Linux as host, I think that it would run even faster (specially if I could set it to run in the GPU).

Well, I must say that I'm a beginner on Neural Networks. Only a few weeks ago I started playing with It and, in the case of RNNs (LSTM), I'm having a hard time trying to do something useful with It. Regarding finance, my knowledge is near zero. Given this situation, I think I won't be able to do something more than help you getting things running.

Steviey commented 7 years ago

Thank you GuilhermeCaeiro, your advice's are informative as always.

Well, I think there will be a ton of new things we can do with such approaches like RNN's, LSTM etc. Especially in times of free available open data and the opportunity to make own apps with unlimited DB's and cloud backends.

I'm a free web- and app developer for a living since 20 years now. And I always want to know how the web works. It's important to me to see the big picture, how Google does it, how YouTube makes it's suggestions etc. I don't want to loose the 'knowledge-connection' to state of the art tech in this fields.

Finance is only one sector of this game. But if you understand everything of your ML stuff, the entry to this world is probably a little easier. Personally I don't do this. There would be much more to learn about chart techniques and fundamental data. And it needs a lot of money :-)

Although there are some big sharks in this waters, ready to eat you. Some say, 'never try high frequency algo-trading as private person'. I saw a video where an expert tells interesting insider news about already launched quant-bots hunting amateur-algorithms, to take there money. Yes, algorithms are already fighting each other in the trading universe.

And there is always a speed and knowledge-disadvantage regarding to big players- physically (special wires), systematically (special laws), digital (special software) and of course by money power (special budgets to buy early data, trade big stakes, hide identities, catch identities etc.). That's why many private algo-traders are trading in smaller sectors with less big fishes in the water. There are statistics out there, telling 90% of private online-traders loose money and never get break-even.

Since AI made a paradigm change with this relatively new approaches and everything becomes any kind of AI, this will be the next big innovation, after the 'cloud-hype' and the unlimited expansion of the JavaScript universe (my guess).

I played with AI 10 years ago for a real customer online-support desk and I must say the new approaches are a game changer.

Here are some very funny examples!:

Danger, danger! 10 alarming examples of AI gone wild

For me the beauty of this stuff is, if you understand ML completely, you can do everything. It only depends on your imagination. It's a kind of freedom and new inspiration for developers.

Thank you so far...will try to get this beast running- once again.

PS:

I already have a running SVM customized for 'own' data (70% performance on up/down predictions), based on Rapidminer Studio (free edition). I like it's UI/drag and drop-approach. Though it's GUI is a little out-dated and not really scale able. This guy helped me a lot too...

https://www.youtube.com/watch?v=w0vSSEq2bn0 (video is a little old)

So the next logical step is to try RNN's and LSTM on TensorFlow.

Steviey commented 7 years ago

Update/Solution:

Finally got it running (with your friendly help), although its output looks slightly different than the original. Anyways, I still have to learn, recap what does it mean in general :-).

Original: output1

My Output: output

Protocol for Win7 users

Short intro/recap... Since there is no TensorFlow für Python 2.x on Windows, we have to make some modifications and hints, to get the example code running on Windows 7 and Python 3.5.

Standalone Requirements:

A) Start the Anaconda prompt and create a virtual environment... $ conda create -n MyLastTest python=3.5
Activate the virtual environment... $ activate MyLastTest Install & remove/reinstall dependencies... (MyLastTest)$ conda install -c conda-forge tensorflow (MyLastTest)$ conda install -c conda-forge keras (MyLastTest)$ conda install -c conda-forge jupyter (MyLastTest)$ conda install -c conda-forge matplotlib=2.0.0 Currently we have the wrong Keras-version... (MyLastTest)$ conda remove keras

B) Switch the Keras-backend to Tensorflow, by editing the Keras-config file in your favorite IDE...

C) Modify the Python code to get version 3-compatibility, according to the error messages in the browser...

(MyLastTest)$ cd c:\MyUserName\MyProjectWithTheSampleCode

To see the errors, we start the notebook... (MyLastTest)$ jupyter notebook

---Wow---

Credits are going to @GuilhermeCaeiro. He provided the crucial informations :-).

PS: Meanwhile there is an official open pull request, for code modifications providing Python 3 compatibility. Since I'm not a Python-Guru, I highly recommend the pull request-version of the code modifications in Chapter "C)".

fabiofumarola commented 7 years ago

ops I made too late a pull request to support python 3 :)

Steviey commented 7 years ago

Yours seems to be a bit different. Can you explain your approach. I'm totally new to Python :-)

for i in range(len(data) // prediction_len):

You didn't cast to int. Am I right? What does '//' do?

Update: The // operator is used for truncating division. My guess is, this will be the same as casting the function result to int.

+.ipynb_checkpoints
 +__pycache__

What is this for?

Update: Checkpoints are temporary snapshots of your notebooks, pycache is a bytecode cache storing data into the corresponding project folder.

Thank you.

fabiofumarola commented 7 years ago

Hi the operator // does division an keeps it integer part. As example 10.5 // 2 return 5.0. And you got it.

Also for the second I've just removed the temporary files from git.

You got all the points by yourself :)

Best, Fabio

Steviey commented 7 years ago

Update: Over night my Python-Version of the Env. seemed to be updated by itself to 3.6 where TensorFlow is not available for Windows.

Update: Don't confound the environments.

I turned it back with... conda install python=3.5

And now I'm getting multiple, different outputs for the same data set. Non of them is looking as good as my first result. Maybe my first result was a cached version of the original- who knows. It seems my graph is getting 'erections' in an unpredictable manner. Version 1: version-1 Version 2: version-2 Version 3: version-3 Version 4: version-4

I'm getting crazy now... The same effect as described here... https://github.com/llSourcell/How-to-Predict-Stock-Prices-Easily-Demo/issues/4

I can't believe it. Is this the end?

Seems a little bit like... another hobby...

GuilhermeCaeiro commented 7 years ago

Hello again!

Regarding your problem with python, as far as I know, it doesn't upgrade itself automatically (and has no reason to do so). I think you probably did something that triggered that (if it really got "upgraded"). Anyway, don't keep more than one python version in an environment (unless needed), because it might mess up with your environment.

Now, talking about your results, they are "right". Unfortunately, the data used for the example in this repository isn't enough to give you more accurate forecasts (given the nature of the stock prices, I could even say that those values themselves might not be a good indicator, if compared to sentiment and other things). The reason why the results differ from each other is the fact that the data is randomly shuffled before the training, what causes the "convergence" to happen in a different way each time. Another interesting things to talk about is: using the code present here "as-is", the more the number of epochs, the more the results will be useless. Being sincere, for me, the "better" (still not useful) results I got were with only one epoch, and it was pretty much a "hit or miss". Sometimes the results would appear to have some meaning, most times it would let me with a "what the f*ck!?" expression.

Given that situation, unless you improve your model (add more "knowledge" to it), it won't be useful at all (to be true, I don't even know if a simple mortal like me could create something "profitable"), but what you have now is probably a good starting point.

Steviey commented 7 years ago

Thank you GuilhermeCaeiro, I mistakenly took the wrong environment. Seems my root env has Python 3.6. Is there a resource describing how to extract the best model from this code and do a real live calculation? And why the heck do we deal with random data? It seems this sample is no real live example- I thought so. This does not makes it easier to understand. Maybe its better to start with a sample that makes sense. I also have to learn how to modify everything running in the notebook binary - puh.

GuilhermeCaeiro commented 7 years ago

Before giving you an answer, I must say that I MIGHT BE WRONG, given the fact that I'm new to neural networks and don't have much experience with machine learning (until now, the only things I have been using were linear and logistic regressions, and mostly for "classroom projects").

Now, trying to answer your questions:

  1. As far as I know, there is no recipe to solve a specific problem (unless you talk about the MNIST character identification problem, that already became a classical one). If you want to use machine learning to solve a problem, it is always up to you (or your team) to analyze it and develop the best approach to deal with it. This code can be seen as a "starting point" because it already has a full "pipeline" built and capable of running. BUT it might not be (and isn't) the best approach. Given that situation, it is up to you to see what you can do to improve the code (give it more data, modify the network structure, tune the parameters etc) and get ("try getting") the answers you are looking for. You might be successful or not (stock price prediction is still a field of research).

  2. Regarding the "random data", it is not "random data". What is done is that the input for the network is generated as series' windows (something like [ [1, 2, 3, ..., 49, 50], [2, 3, 4, ..., 50, 51], ...]) and then that window's list is shuffled (randomly). Only after that the data is provided to the network. Why do we do that? Is it really needed? I can't say for sure for the case of neural networks (as I said before, I'm a beginner), but the following links might give you an answer: http://stats.stackexchange.com/questions/40638/predicting-time-series-with-nns-should-the-data-set-be-shuffled http://stackoverflow.com/questions/40816721/should-i-shuffle-the-data-to-train-a-neural-network-using-backpropagation https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks

  3. This is a real world example, BUT not the best solution. It aims to be a Keras+LSTM example and does that very well. The problems is that, if you came here thinking that it is some kind of trade robot or something in that sense, unfortunately, it isn't.

Now, what I would suggest you is:

  1. If you still want to give the code a chance, It would be better if you learn Python first. Codeacademy might be a good starting point.

  2. If you want to improve your model, as I already mentioned before, there are lots of things you can do. For example, I would start (didn't do it yet) playing with the network parameters and structure to see if I can get a better convergence. Perhaps you could get better results for smaller (or bigger) window sizes, etc... After that, if I couldn't get any better, I would try providing more data. For example, if there are some other stocks that might have some relationship to the one I want to predict, perhaps combining their values to my input could improve my results if those stocks can really influence the stock I'm analyzing. Market sentiment is another thing that might be helpful, although it is more complicated to obtain.

  3. If you are not used to machine learning, I would suggest you to keep in mind how to proceed and how things work. The link bellow describe general steps to build a machine learning application: http://docs.aws.amazon.com/machine-learning/latest/dg/building-machine-learning.html There is even a course on Udacity about it: https://br.udacity.com/course/model-building-and-validation--ud919/

To finish, the code in this repository is based (basically, the same) on the code developed for the following article: http://www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction If you read it, you will get a better answer on why the results are that bad.

ckl8964 commented 7 years ago

Very resourceful ! So interesting !

Even after removed the "shuffle" (i.e. comment out "np.random.shuffle(train)" in load_data()), the output predication graph is still different each time I completely restart the whole linux machine.

Is this normal ? And why ?

ckl8964 commented 7 years ago

It seems the initial state (i.e. weight, bias) for the RNN is different each time it runs "model.fit()", even it starts the linux machine from scratch each time. And, as we just did 1 epoch, so every time it comes up with different prediction result is normal ....

Am I right ?

GuilhermeCaeiro commented 7 years ago

Hello ckl8964,

The most visible place where some kind of randomization happens is in the "dropout layer". If we consider the code as provided in this repository, when the data flows through that layer, 20% of the data is randomly "discarded", in order to prevent overfitting: https://keras.io/layers/core/#dropout Regarding the way "model.fit()" works, I don't know how it is exactly implemented, but I must say that, in the case of the weight's matrix in NNs, it seems to be common to initialize it randomly (once again, I must say that I don't know which approach Keras takes).

One way to try reproducing a result is by setting a seed for the random number generator: https://docs.python.org/3/library/random.html#random.seed https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html I don't know which one Keras and its dependencies use, or even if they are the same thing.

I never used that though... so I don't know if it will work or not.

Steviey commented 7 years ago

@GuilhermeCaeiro Thank you for the infos.

The Python - thing isn't such a big deal, since I already know 13 other languages. Anyways, for all fellows already know other languages, I recommend this very straight forward and compact one hour-tutorial....

Python in one Hour with Derek Banas

There is also a 'Python-preparation' course on Sirjas Udacy page (Chapter: PROGRAM SYLLABUS): Students lacking the requisite Python knowledge can take the first four lessons in Intro to Machine Learning to address this requirement.

The RNN/LSTM stuff seems to be much more complicated then my successfully adapted SVM with Rapidminer. I needed one week or so to understand and use it (with a little help)!

I've seen the video from Jackob the author you mentioned before, but understood not much.

So I booked a free online-course. Fingers crossed that this will help.

PS: Now as you told us, I remember randomizations as tool from my times as student in Pychology/Statistics (long time ago) ;-).

ckl8964 commented 7 years ago

Thanks GuilhermeCaeiro ! Your response is always meaningful !!

Yes, I start to recall some memory from the NN course long time ago in college. Have to make some revision and get more update from internet (and all of you here !).

Thanks ! Interesting !!

Steviey commented 7 years ago

Does anybody knows a good link for notebook-convert instructions in one chapter? Update: http://stackoverflow.com/questions/17077494/how-do-i-convert-a-ipython-notebook-into-a-python-file-via-commandline