Closed hn2 closed 5 years ago
I see about 50% cpu utilization with core i7 cpu and <= 10% for gpu
My guess is that your environment is too simple, this can cause the GPU and CPU to wait each other as the CPU is trying to run the environment with high multiprocess overhead (when compared to the load), and then having to wait for the GPU latency for the given batch size.
You are also using a very powerful GPU for a very simple task, hence the 10% load on the GPU.
Just as a side note, what CPU are you using exactly? I'm supprised to see a high power GPU combined with a 4 threaded i7, are you sure it's not 8 threads?
EDIT: checking on intel ARK for 4 threaded desktop CPUs None of them are i7, and when switched to laptop they are all low power CPUs for utlrabooks. n_cpu
is the number of cpu threads not the number of cpu cores.
This is my pc configuration: https://www.userbenchmark.com/UserRun/16739440
Also, I tested my portfolio env several times with different instruments and parameters and reward never exceeded -0.5 and that's weird.
i7-8700, 6 cores 12 threads
Try again with n_cpu = 12
.
As for the reward, it's possible the methods do not work with your problem. This is still machine learning, and there are no magic bullets unfortunatly.
Yes I tried that with c_cpu = 12. Still I see 12 processes spawned in the task manager with only one using gpu with very low utilization ~ 2%. All other processes don't use gpu at all - 0%. As to the reward, the original implementation on GitHub works and profitable. It doesn't make sense that out of millions of simulation runs not even one is profitable.
Still I see 12 processes spawned in the task manager with only one using gpu with very low utilization ~ 2%. All other processes don't use gpu at all
Thats normal, after the steps of the environments ends, the processes send the data to the master thread, which then passes through the neural network. So only one processe is using the GPU, and the rest is simulating your environment with the CPUs. The goal of Multi - CPU environments is to reduce the time to simulate the environment and run more steps per seconds to feed the GPU. If the CPUs cannot simulate any faster (either due to a lack of computing power or Amdahl's law), then the GPU will invariantly be slowed down.
As to the reward, the original implementation on GitHub works and profitable. It doesn't make sense that out of millions of simulation runs not even one is profitable.
Can you show a benchmark with a specific method I could compare too? so I can make sure if this is an implementation issue or not for a given method.
Does this mean that I wasted money on gpu? I can not use it to accelerate training?
Does this mean that I wasted money on gpu? I can not use it to accelerate training?
You can throw a bigger network at your problem (by default it is 2 layers of 64), that will use more GPU power and might help your convergence.
from the documentation:
from stable_baselines.common.policies import FeedForwardPolicy
# Custom MLP policy of three layers of size 128 each
class CustomPolicy(FeedForwardPolicy):
def __init__(self, *args, **kwargs):
super(CustomPolicy, self).__init__(*args, **kwargs,
net_arch=[dict(pi=[128, 128, 128],
vf=[128, 128, 128])],
feature_extraction="mlp")
model = PPO2(CustomPolicy, env, verbose=0, tensorboard_log=settings['tensorboard_log'])
What is pi and what is vf? What if I want custom custom MlpLnLstmPolicy?
on the documentation page, it says this:
The LstmPolicy can be used to construct recurrent policies in a similar way:
class CustomLSTMPolicy(LstmPolicy): def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, n_lstm=64, reuse=False, **_kwargs): super().__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, n_lstm, reuse, net_arch=[8, 'lstm', dict(vf=[5, 10], pi=[10])], layer_norm=True, feature_extraction="mlp", **_kwargs)
so:
from stable_baselines.common.policies import LstmPolicy
# Custom MLP policy of three layers of size 128 each
class CustomPolicy(LstmPolicy):
def __init__(self, *args, **kwargs):
super(CustomPolicy, self).__init__(*args, **kwargs,
net_arch=[8,
'lstm',
dict(pi=[128, 128, 128],
vf=[128, 128, 128])],
layer_norm=True, feature_extraction="mlp")
model = PPO2(CustomPolicy, env, verbose=0, tensorboard_log=settings['tensorboard_log'])
What is pi and what is vf?
pi is the policy function, vf is the value function (here is a really good write up if you want to know more about actor critic models)
I am using now a custom policy but GPU utilization still very low (< 5%):
# Custom MLP policy of five layers
class CustomPolicy(LstmPolicy):
def __init__(self, *args, **kwargs):
super(CustomPolicy, self).__init__(*args, **kwargs,
net_arch=[8,
'lstm',
dict(pi=[2048, 1024, 512, 256, 128],
vf=[2048, 1024, 512, 256, 128])],
layer_norm=True, feature_extraction="mlp")
model = PPO2(CustomPolicy, env, verbose=0, tensorboard_log=settings['tensorboard_log'])
Another question, once I have the model trained, how do I use it? Create observation and use predict? Do I have to step the env?
n_cpu = 12
env = PortfolioEnv(history=history, abbreviation=instruments, steps=settings['steps'], window_length=settings['window_length'])
env = SubprocVecEnv([lambda: env for i in range(n_cpu)])
model_name = str(settings['model_name']) + '_' + str(settings['policy']) + '_' + str(settings['total_timesteps']) + '_' + str(settings['total_steps']) + '_' + str(settings['window_length']) + '_' + str(settings['allow_short'])
model = PPO2.load(model_name)
obs = env.reset()
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
I am using now a custom policy but GPU utilization still very low (< 5%)
How is your CPU? at least one will be a bottleneck, not supprising that it would be the CPU. Just for reference OpenAI use massive CNNs on 128000 CPUs for 256 GPUs with OpenAI Five. Hence MLP on 16 threads will have trouble saturating a GPU. You can benchmark with timing code, most likely you have a non negligible speed up with a GPU however.
Another question, once I have the model trained, how do I use it? Create observation and use predict? Do I have to step the env?
when the model is trained you can simply give it the observations you wish to use. However if you are using reccurent networks, you need to use the state in the predict
function:
states = model.initial_state # get the initial state vector for the reccurent network
dones = np.zeros(states.shape[0]) # set all environment to not done
...
# in your loop
action, _values, states, _neglog = model.predict(obs, states, dones)
# where obs is the observation you want to use the model on in production
...
I am not sure that I understand what state is. This is my code, how do I construct the obs and state?
### Quantiacs RL
# import necessary Packages below:
import numpy as np
from quantiacsToolbox.quantiacsToolbox import runts
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines import PPO2
from portfolio import PortfolioEnv
def myTradingSystem(DATE, OPEN, HIGH, LOW, CLOSE, VOL, exposure, equity, settings):
''' This system uses trend following techniques to allocate capital into the desired equities'''
nMarkets = CLOSE.shape[1]
pos = np.zeros(nMarkets)
instruments = []
history = np.empty(shape=(len(settings["markets"]), len(OPEN), 5), dtype=np.float)
instruments = settings["markets"]
for m in range(len(instruments)):
for d in range(len(OPEN)):
history[m][d] = np.array([OPEN[d,m], HIGH[d,m], LOW[d,m], CLOSE[d,m], VOL[d,m]])
# write_to_h5py(history, instruments, 'datasets/' + settings['model_name'] + '.h5')
# multiprocess environment
n_cpu = 12
env = PortfolioEnv(history=history, abbreviation=instruments, steps=settings['steps'], window_length=settings['window_length'])
env = SubprocVecEnv([lambda: env for i in range(n_cpu)])
print(settings['model_filename'])
model = PPO2.load(settings['model_filename'])
obs = env.reset()
action, _states = model.predict(obs)
'''
while True:
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
'''
# weights = pos/np.nansum(abs(pos))
weights = action
return weights, settings
def mySettings():
''' Define your trading system settings here '''
settings = {}
settings['markets'] = ['CASH', 'F_AD', 'F_BP', 'F_CD', 'F_EC', 'F_JY','F_SF', 'F_ND']
settings['lookback'] = 2300
settings['budget'] = 10**6
settings['slippage'] = 0.05
settings['endInSample'] = '20150101'
settings['beginInSample'] = '20050101'
model = 'currencies'
settings['steps'] = 2000
settings['window_length'] = 3
settings['allow_short'] = False
settings['total_timesteps'] = 10000000 # 100000000
settings['model_name'] = model + '_' + settings['beginInSample'] + '_' + settings['endInSample']
settings['model_filename'] = model + '_' + settings['beginInSample'] + '_' + settings['endInSample'] + '_' + str(settings['total_timesteps']) + '_' + str(settings['steps']) + '_' + str(settings['window_length'])
# tensorboard --logdir=tensorboard tensorboard --logdir=src
return settings
# Evaluate trading system defined in current file.
if __name__ == '__main__':
results = runts(__file__)
#optimize(__file__)
I am not sure that I understand what state is.
In your case, the state is the LSTM internal state (denoted h_t and c_t)
( LSTM cell diagram, image from here )
This is my code, how do I construct the obs and state?
I already showed you how to construct the inital state:
# intialized here
states = model.initial_state # get the initial state vector for the reccurent network
dones = np.zeros(states.shape[0]) # set all environment to not done
# updated here
action, _values, states, _neglog = model.predict(obs, states, dones)
as for the observation, I dont know, this is not my code and I dont understand the usage or the purpose. It should be a numpy array of the same shape as the environment.
Ok got it. Hopefully almost there. One more problem in model = PPO2.load(settings['model_filename'])
File is there. I tried also with admin privileges but it doesn't work.
<class 'PermissionError'>
Traceback (most recent call last):
File "C:\Users\hanna\Anaconda3\lib\site-packages\quantiacsToolbox\quantiacsToolbox.py", line 871, in runts
position, settings = TSobject.myTradingSystem(*argList)
File "ppo2_quantiacs_test2.py", line 33, in myTradingSystem
model = PPO2.load(settings['model_filename'])
File "c:\users\hanna\stable-baselines\stable_baselines\common\base_class.py", line 550, in load
data, params = cls._load_from_file(load_path)
File "c:\users\hanna\stable-baselines\stable_baselines\common\base_class.py", line 361, in _load_from_file
with open(load_path, "rb") as file:
PermissionError: [Errno 13] Permission denied: 'currencies_20050101_20150101_10000000_2000_3'
PermissionError: [Errno 13] Permission denied: 'currencies_20050101_20150101_10000000_2000_3'
That's a directory no?
mmm... I have both tensorflow log directory with that name as well as a pkl file with the same name.
Ok. Directory renamed. Now got
Traceback (most recent call last):
File "C:\Users\hanna\Anaconda3\lib\site-packages\quantiacsToolbox\quantiacsToolbox.py", line 871, in runts
position, settings = TSobject.myTradingSystem(*argList)
File "ppo2_quantiacs_test2.py", line 41, in myTradingSystem
action, _values, states, _neglog = model.predict(obs, states, dones)
ValueError: not enough values to unpack (expected 4, got 2)
Also, why is it refusing to accept feature_extraction='cnn'
?
print(np.shape(obs))
print(np.shape(states))
print(np.shape(dones))
print(obs)
print(states)
print(dones)
(12, 120)
(12, 512)
(12,)
[[1.000000e+00 1.000000e+00 1.000000e+00 ... 1.057500e+05 1.061250e+05
1.555900e+04]
[1.000000e+00 1.000000e+00 1.000000e+00 ... 9.853750e+04 9.980000e+04
4.457200e+04]
[1.000000e+00 1.000000e+00 1.000000e+00 ... 1.042000e+05 1.044875e+05
1.994300e+04]
...
[1.000000e+00 1.000000e+00 1.000000e+00 ... 9.571250e+04 9.615000e+04
2.808500e+04]
[1.000000e+00 1.000000e+00 1.000000e+00 ... 9.853750e+04 9.980000e+04
4.457200e+04]
[1.000000e+00 1.000000e+00 1.000000e+00 ... 1.054250e+05 1.057750e+05
1.149000e+04]]
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
obs = env.reset()
states = model.initial_state # get the initial state vector for the reccurent network
dones = np.zeros(states.shape[0]) # set all environment to not done
print(np.shape(obs))
print(np.shape(states))
print(np.shape(dones))
print(obs)
print(states)
print(dones)
# updated here
# action, _values, states, _neglog = model.predict(obs, states, dones)
action, _states = model.predict(obs, states, dones)
print(action)
[[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]
[nan nan nan nan nan nan nan nan]]
ValueError: could not broadcast input array from shape (12,8) into shape (8)
Why is it all nan and where does the 12 rows come from?
Can anyone help why predict doesn't work?
Anyone?
As to the GPU utilization problem, I think that windows performance monitor doesn't show the correct utilization. I tried GPU-Z and it shows 30-60% GPU load.
I figured out that the 12 rows in the action comes from number of cpu's. When I change to n_cpu = 1 I get: ValueError: Cannot feed value of shape (1, 120) for Tensor 'input/Ob:0', which has shape '(12, 120)' How do I predict then? How do I combine results from multiprocess env to one action?
I am also struggling with this - anyone have any ideas?
I figured out that the 12 rows in the action comes from number of cpu's. When I change to n_cpu = 1 I get: ValueError: Cannot feed value of shape (1, 120) for Tensor 'input/Ob:0', which has shape '(12, 120)' How do I predict then? How do I combine results from multiprocess env to one action?
I am struggling with this too. In my case I just created 12 parallel test environments, and the result I get has dimension 12. I just flattened them, and treated as 12 individual test points? I am not sure. Would appreciate it a lot if someone can shed light on this one.
@op1490 @troychen728 for predicting for only one env, you can find a solution here: https://github.com/hill-a/stable-baselines/issues/166#issuecomment-502350843
Hi, I am new here. And I still don't understand how to use gpu with tensorflow within stable-baseline. It seems that the gpu is automatically used when tensorflow-gpu is installed correctly?
@dbsxdbsx Yes, if you have tensorflow-gpu installed, then most of the stable-baselines algorithms will use GPU.
@Thanks.
I am running a ppo2 model. I see high cpu utilization and low gpu utilization.
When running:
I get:
I understand that tensorflow is "seeing" my gpu. Why is the low utilization when training a stable baseline model?