Questions about Experience Replay Buffer

Hi @miyosuda

Thanks again for the open-source code implementation. It is of great help. I had a doubt on the way experience replay buffer is being filled.

In the main.py the when the top level process is called for each environment,

diff_global_t = trainer.process(self.sess,
                                      self.global_t,
                                      self.summary_writer,
                                      self.summary_op,
                                      self.score_input)

the replay buffer is being filled here in the below lines of code since the experience will not be full at the start- Am I correct?.

 # Fill experience replay buffer
    if not self.experience.is_full():
      self._fill_experience(sess)
      return 0

Then inside the base A3C process, we keep adding the new frames in the below lines:

frame = ExperienceFrame(prev_state, reward, action, terminal, pixel_change,
                              last_action, last_reward)

      # Store to experience
      self.experience.add_frame(frame)

So just to confirm, the _process_base function will control what goes to the experience replay always, is this understanding correct of the implementation? Although, at first instance, the auxiliary tasks (VR, RP, PC) use the experience frames from the foremost filling which happened outside the base process? Is this correct? Am I missing something?

Thank you for your time in clarification on these doubts.

miyosuda / unreal

Questions about Experience Replay Buffer #24