nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

advantages = rewards - state_values.detach() problem #35

Closed fatalfeel closed 4 years ago

fatalfeel commented 4 years ago

advantages = rewards - state_values.detach() after self.optimizer.step() the network critic values' weights never change because detach()

fatalfeel commented 4 years ago

try this way check self.L0 self.L1 weights self.L0 =nn.Linear(64, 32) self.L1 =nn.Linear(64, 32) self.network_act = nn.Sequential( nn.Linear(dim_states, 64), nn.Tanh(), self.L0, nn.Tanh(), nn.Linear(32, dim_acts), nn.Tanh() )

network_value

    self.network_critic = nn.Sequential(nn.Linear(dim_states, 64),
                                        nn.Tanh(),
                                        self.L1
                                        nn.Tanh(),
                                        nn.Linear(32, 1) )

alseo we can print (self.L0._parameters['weights'].data)

nikhilbarhate99 commented 4 years ago

refer #29