kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.24k stars 264 forks source link

Training block env sampling #421

Closed lidongke closed 4 years ago

lidongke commented 4 years ago

HI~

I focus on the sampling efficiencies, if i training a DQN agent, the nn will update every training_frequency timesteps. But when nn update , it will block the env sampling, how can i slove this problem? @kengz

kengz commented 4 years ago

Hi @lidongke , that's the nature of the control loop: sampling data and training are intertwined that way. There are a few ways to increase the sampling speed:

That said, we do not plan on supporting these features, but feel free to reach out if you have further questions about them!

lidongke commented 4 years ago

Asynchronous sampling is a good idea! I consider that i will spawn a subprocess with the "train()" function and use the "global_net" as the master agent ,only sample batches to optimize the "global_net" and periodically send the net parameters to the local agent by shared memory. What do you think about my plan? @kengz

kengz commented 4 years ago

Oh that's right! you can just use asynchronous parallelization a.k.a. Hogwild. Just set the spec variable meta.distributed to shared, and max_session as the number of workers, and run. Then it will do what you described above.

lidongke commented 4 years ago

If i use venv for batch actions and obs, i test the "shared" and "synced" mode in your src code are still serial sampling and training , Am i right? 1.In your comment: - 'synced': global network parameter is periodically synced to local network after each gradient push. In this mode, algorithm will keep a separate reference toglobal_{net}for each of its network I think this mean that each worker use local network to action selection, the master network async update(never block worker sampling) and periodically synced to local network. But in control.py:

def run_rl(self):
        '''Run the main RL loop until clock.max_frame'''
        logger.info(f'Running RL loop for trial {self.spec["meta"]["trial"]} session {self.index}')
        clock = self.env.clock
        state = self.env.reset()
        done = False
        while True:
            if util.epi_done(done):  # before starting another episode
                self.try_ckpt(self.agent, self.env)
                if clock.get() < clock.max_frame:  # reset and continue
                    clock.tick('epi')
                    state = self.env.reset()
                    done = False
            self.try_ckpt(self.agent, self.env)
            if clock.get() >= clock.max_frame:  # finish
                break
            clock.tick('t')
            with torch.no_grad():
                action = self.agent.act(state)
            next_state, reward, done, info = self.env.step(action)
            self.agent.update(state, action, reward, next_state, done)
state = next_state

If i use venv , the "act()" function will return batch actions for venv, that mean action selection and optimize in one process,so that action selection and optimize are serial ,i can't understand the "synced" mode with your comment.If i should change your code as my cognition? 2.I want to raise up sampling efficiencies,the spec variable max_session can run parallel sessions,but my probelm is each session is slow, when training start the cpu resource can't fully used,i hope the resource fully used~ @kengz