lefnire / tforce_btc_trader

TensorForce Bitcoin Trading Bot
http://ocdevel.com/podcasts/machine-learning/26
GNU Affero General Public License v3.0
814 stars 234 forks source link

Volatile GPU-Util is low #36

Open chinshou opened 6 years ago

chinshou commented 6 years ago

I am testing with a GTX1070 . when running python hypersearch.py to hypersearch ,By running nvidia-smi , I found that the Volatile GPU-Util is very low (only 6%) with Perf = P2 and GPU memory used 7773MB.

The hyper search speed is also not faster than cpu very much , I expected the speed should be 10x. Is it correct?

best regards

chinshou commented 6 years ago

From https://github.com/reinforceio/tensorforce/issues/290 , it seems a ppo agent issue of tensorforce that does not often update the status often. So the gpu does not have many merit over the tensorflow cpu?

methenol commented 6 years ago

Compiling tensorflow from source with march=native to ensure the instructions for your CPU are supported gives a significant performance boost compared to the tensorforce version you grab from pip. From what I've read, they disabled some of the instructions in the pre-compiled version due to compatibility issues between Intel and AMD. Using PPO, I see significantly longer episode times utilizing GPU compared to running CPU. Rizen 7 and 1080 TI being used to test, along with some cloud instances. The GPU utilization I noticed was similar to what you're referencing, uses up all the GPU ram then usually less than 10% utilization of the GPU processing. Are you working with LSTM or CNN?

lefnire commented 6 years ago

Hi @methenol, I see all your comments & exploration on these tickets & hope to get to them soon. I'm pretty swamped with work so it may take me a while. In the meantime I've worked on a refactor of the project on https://github.com/lefnire/tforce_btc_trader/tree/refactor. It removes significant amounts of code:

  1. move from custom hypersearch setup to HyperOpt
  2. get rid of LSTM, I think there's theoretical issues, and TensorForce implementation issues, putting it on the shelf for later
  3. significant: fixes the way train/test data split is handled - that is, sequentially (google time series train/test split).

I also took one more step of sticking to most recommended hypers in https://github.com/lefnire/tforce_btc_trader/tree/wip, and leaving HyperOpt to the things I'm really unsure about. That branch is pretty destructive, but informative; so take a look, but don't work from it. Neither of these branches are showing signs of convergence, so I'm stepping away for a while. If you feel you have a solid grasp of things, I'd be happy to add you as a maintainer?

methenol commented 6 years ago

I'd be happy to help sir

mysl commented 6 years ago

Neither of these branches are showing signs of convergence

do you mean train phase or test phase?

methenol commented 6 years ago

With v0.2 that is using CNN I'm seeing a performance increase using GPU over CPU. The volatility is still low, similiar to what @chinshou reported. Running watch -n 1 nvidia-smi I'm seeing average Volatile GPU-Util around 7% and my 1080 TI hybrid is pulling around 80W. Periodically I'm seeing spikes to around 95% utilization and power jumping to around 150W, and it seems to be happening on a relatively consistent interval. I'm hoping this coincides with the agent updating. I'll check to see if switching to timestep updates changes the frequency of spikes so we can pin it down as expected behavior. I haven't had much success using timestep updates with the PPO agent so would just be a test of the GPU usage.

I don't think this is an issue with the code in this repository and suspect that further optimization, if possible, would need to be done on the tensorforce side. I've seen similar utilization using tensorforce with different environments, and similar utilization with tensorforce's DDPG agent.

Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018) https://www.youtube.com/watch?v=SxOsJPaxHME

Optimizing like this from within tensorforce is a bit above my head but I'd be interested to see if there is some optimization that hasn't been implemented in the tensorforce library yet. It may just be the nature of the agents and how the updates occur though