watakandai / hiro_pytorch

Implementation of HIRO (Data-Efficient Hierarchical Reinforcement Learning)
90 stars 20 forks source link

Shouldn't the state value be subtracted when calculating the new subgoal? #10

Open skkuai opened 2 years ago

skkuai commented 2 years ago

Shouldn't the state value be subtracted when calculating the new subgoal?

I understood that subgoal represents the relative position by subtracting some of the state values from the absolute subgoal.

However, the state value is not subtracted when calculating the new subgoal as shown below (line 646~663, hiro/model.py).

def _choose_subgoal_with_noise(self, step, s, sg, n_s):
        if step % self.buffer_freq == 0: # Should be zero
            sg = self.high_con.policy_with_noise(s, self.fg)
        else:
            sg = self.subgoal_transition(s, sg, n_s)

        return sg

...
def _choose_subgoal(self, step, s, sg, n_s):
        if step % self.buffer_freq == 0:
            sg = self.high_con.policy(s, self.fg)
        else:
            sg = self.subgoal_transition(s, sg, n_s)

        return sg

Shouldn't we calculate like the following?

def _choose_subgoal_with_noise(self, step, s, sg, n_s):
        if step % self.buffer_freq == 0: # Should be zero
            sg = self.high_con.policy_with_noise(s, self.fg)
            sg -= n_s[:sg.shape[0]]
        else:
            sg = self.subgoal_transition(s, sg, n_s)

        return sg
...
def _choose_subgoal(self, step, s, sg, n_s):
        if step % self.buffer_freq == 0:
            sg = self.high_con.policy(s, self.fg)
            sg -= n_s[:sg.shape[0]]
        else:
            sg = self.subgoal_transition(s, sg, n_s)

        return sg