Closed lidongke closed 4 years ago
Hi @lidongke , that's the nature of the control loop: sampling data and training are intertwined that way. There are a few ways to increase the sampling speed:
num_envs = 8
, so that each sampling step will pull from more environments.env.step
. However this requires significant engineering effort in changing the code, and is likely not worth the effort for the broader use casesstate, action, reward, next_state, done
from previous runs and reload them in your new run. This should just require you to change your code to save the memory's content into a Python pickle file or numpy serialized data. You can also do some data selection on the side by filtering out data samples that are similar to prevent too many duplicates.That said, we do not plan on supporting these features, but feel free to reach out if you have further questions about them!
Asynchronous sampling is a good idea! I consider that i will spawn a subprocess with the "train()" function and use the "global_net" as the master agent ,only sample batches to optimize the "global_net" and periodically send the net parameters to the local agent by shared memory. What do you think about my plan? @kengz
Oh that's right! you can just use asynchronous parallelization a.k.a. Hogwild. Just set the spec variable meta.distributed
to shared
, and max_session
as the number of workers, and run. Then it will do what you described above.
If i use venv for batch actions and obs, i test the "shared" and "synced" mode in your src code are still serial sampling and training , Am i right?
1.In your comment:
- 'synced': global network parameter is periodically synced to local network after each gradient push. In this mode, algorithm will keep a separate reference to
global_{net}for each of its network
I think this mean that each worker use local network to action selection, the master network async update(never block worker sampling) and periodically synced to local network.
But in control.py:
def run_rl(self):
'''Run the main RL loop until clock.max_frame'''
logger.info(f'Running RL loop for trial {self.spec["meta"]["trial"]} session {self.index}')
clock = self.env.clock
state = self.env.reset()
done = False
while True:
if util.epi_done(done): # before starting another episode
self.try_ckpt(self.agent, self.env)
if clock.get() < clock.max_frame: # reset and continue
clock.tick('epi')
state = self.env.reset()
done = False
self.try_ckpt(self.agent, self.env)
if clock.get() >= clock.max_frame: # finish
break
clock.tick('t')
with torch.no_grad():
action = self.agent.act(state)
next_state, reward, done, info = self.env.step(action)
self.agent.update(state, action, reward, next_state, done)
state = next_state
If i use venv , the "act()" function will return batch actions for venv, that mean action selection and optimize in one process,so that action selection and optimize are serial ,i can't understand the "synced" mode with your comment.If i should change your code as my cognition? 2.I want to raise up sampling efficiencies,the spec variable max_session can run parallel sessions,but my probelm is each session is slow, when training start the cpu resource can't fully used,i hope the resource fully used~ @kengz
HI~
I focus on the sampling efficiencies, if i training a DQN agent, the nn will update every training_frequency timesteps. But when nn update , it will block the env sampling, how can i slove this problem? @kengz