Profile jobs - Githubissues

owocki commented 8 years ago

Profile the jobs based on the stats at https://github.com/owocki/pytrader/pull/42 and see if we can get the runtime down.

Snipa22 commented 8 years ago

Based on our slack discussion from last night: predict_many_v2 is the main target for this, predict_many_sk is semi-reasonable with high thread counts, and some decent CPU's.

predict_many_v2 running on my listed box is looking to be around a 125-130 hour job-length at 24 threads, with a total of 17.2k jobs. This means that a standard run would be approximately 3000-3200 hour length process for a standard single-thread machine, with a linear decrease as the thread count, and cores up. This process is entirely CPU blocked.

Currently, predict_many_v2 has 6.1k jobs per currency pair.

Step one for this is obviously optimization of the system - Unknown if can be improved, I suspect that this is entirely the NN system chewing up the CPU time. Step two is to look at how best to spread the load for these. Single box solutions obviously aren't fantastic. Looking at Amazon Lambda, you're looking at around 150$/run per currency pair, so costs quickly grow out of control, however that's not guaranteed, and is a function of the number of jobs running as well. Of course, we're currently in brute-force mode, so...

owocki commented 8 years ago

pasting my notes from the #contributors slack chanenl from last night:

options:
1) reduce the # of permutations by a factor of 10 or 100 by making some guesses based on initial data results
2) profile the shit out of the do_* functions.
3) start running classifiertests (predict_many_sk.py) instead of predictiontests(predict_many_v2).  there might be lower hanging profits available via classifiers.

Snipa22 commented 8 years ago

Profiling run complete, trimmed everything that didn't run very much: https://gist.github.com/Snipa22/c981ece43b870645a3ddbaa4d57d89bd

Snipa22 commented 8 years ago

As expected, the entire time is spent in NN, due to this, I reviewed some of the suggested optimzations for pybrain, and moved onto working with ARAC:

ARAC Run: root@015fee08dac3:~/pytrader# time ./manage.py predict_one_test (p)starting ticker:BTC_ETH hidden:1 min:2880 epoch:1000 gran:15 dsinputs:5 learningrate:0.05 bias:True momentum:0.1 weightdecay:0.0 recurrent:True, timedelta_back_in_granularity_increments:1000 Getting Neural Network at: 20:13:09.869997 Got Neural Network at: 20:35:32.848241 (p)directionally correct 494 of 995 times. 49.0%. avg diff=0.0051, profit=0.0

real 22m33.334s user 22m23.210s sys 0m0.902s

Non-ARAC Run: root@015fee08dac3:~/pytrader# time ./manage.py predict_one_test (p)starting ticker:BTC_ETH hidden:1 min:2880 epoch:1000 gran:15 dsinputs:5 learningrate:0.05 bias:True momentum:0.1 weightdecay:0.0 recurrent:True, timedelta_back_in_granularity_increments:1000 Getting Neural Network at: 20:40:35.460845 Got Neural Network at: 21:13:58.660506 (p)directionally correct 523 of 995 times. 52.0%. avg diff=0.004, profit=0.0

real 33m28.140s user 33m14.533s sys 0m1.106s root@015fee08dac3:~/pytrader#

I'll have a branch up in the next day or so with ARAC support, as it does appear to provide up to a 30% speed increase over non ARAC runs.

http://pybrain.org/docs/advanced/fast-pybrain.html

t0mk commented 8 years ago

Anybody else is worried that the arac library was abandoned in 2010?

Snipa22 commented 8 years ago

Pybrain itself is been nearly idle in development, .3 was released back in 09. ARAC is the low-hanging fruit for the moment, a quick speedup/replacement until we can see about replacing the entire library. To this end, I'm looking at abstracting out the NN from the core software so it can be distributed easier, this will also give a better interface for us to try out other NN's than Pybrain going forwards.

Ref: http://stats.stackexchange.com/questions/62370/pybrain-so-slow

Developer of pybrain and ARAC notes that it stopped in 2010, suggests to move to a new NN rather than trying to speed up pybrain.

owocki commented 8 years ago

Developer of pybrain and ARAC notes that it stopped in 2010, suggests to move to a new NN rather than trying to speed up pybrain.

Wish I had done more dilligence when starting the repo. I probalby would have gone in another direction knowing this.

Snipa22 commented 8 years ago

I don't think it's going to be an issue. My current plan is to migrate the neural network to a "worker" system, that will be fed a job with all of it's data, then will return the results we need. This will let us more easily replace and rebuild the NN with other systems and allow for "dropin" workers and let people choose the NN they like best.

owocki / pytrader

Profile jobs #46