pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.18k stars 158 forks source link

Update self._cumulative_steps in non-actor-learner training #18

Closed muupan closed 4 years ago

muupan commented 4 years ago

self._cumulative_steps was introduced for correctly counting steps in actor-learner training , but was invalid in non-actor-learner training. (See the internal repository for how it was introduced.) This PR makes it valid in both actor-learner and non-actor-learner training.

In non-actor-leaner training, it is now equivalent to self.t. However, self.t and self._cumulative_steps are different in actor-learner training. That is why they cannot simply merged into one.

muupan commented 4 years ago

Currently, in actor-learner training, self.t is equal to self.optim_t * self.update_interval. Thus, it simply counts the number of updates, multiplied by update_interval. It is used to compare against target_update_interval. This makes more sense than using cumulative_steps since the target network should be updated after the model is updated enough times.

These are confusing indeed. I think there is room for improvement, but I leave it for future work.