watakandai / hiro_pytorch

Implementation of HIRO (Data-Efficient Hierarchical Reinforcement Learning)
90 stars 20 forks source link

Why does the class 'TD3Critic' have two Q-function estimater but only use one? #9

Open G0K0URURI opened 2 years ago

G0K0URURI commented 2 years ago

The class 'TD3Critic' seems to have two Q-function estimates, but only the first one is used. `class TD3Critic(nn.Module):

def __init__(self, state_dim, goal_dim, action_dim):
    super(TD3Critic, self).__init__()
    # Q1
    self.l1 = nn.Linear(state_dim + goal_dim + action_dim, 300)
    self.l2 = nn.Linear(300, 300)
    self.l3 = nn.Linear(300, 1)
    # Q2
    self.l4 = nn.Linear(state_dim + goal_dim + action_dim, 300)
    self.l5 = nn.Linear(300, 300)
    self.l6 = nn.Linear(300,  1)`

I guess you may forget to delete the second one while realizing the ensemble over Q-value models in the class 'TD3Controller' : )