Closed fatalfeel closed 4 years ago
try this way check self.L0 self.L1 weights self.L0 =nn.Linear(64, 32) self.L1 =nn.Linear(64, 32) self.network_act = nn.Sequential( nn.Linear(dim_states, 64), nn.Tanh(), self.L0, nn.Tanh(), nn.Linear(32, dim_acts), nn.Tanh() )
self.network_critic = nn.Sequential(nn.Linear(dim_states, 64),
nn.Tanh(),
self.L1
nn.Tanh(),
nn.Linear(32, 1) )
alseo we can print (self.L0._parameters['weights'].data)
refer #29
advantages = rewards - state_values.detach() after self.optimizer.step() the network critic values' weights never change because detach()