ikostrikov / pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
MIT License
433 stars 91 forks source link

Idon‘t konw what the “neggdotstepdir” for ,Thanks !!! #20

Open baywc568 opened 3 years ago

baywc568 commented 3 years ago

Thank you very much for the code you provided!I learn a lot from it . I would like to ask what is the function of these lines of code, is there any mathematical proof or the like, thank you very much!!!these are different from the original paper?Thanks!!!

neggdotstepdir = (-loss_grad * stepdir).sum(0, keepdim=True)
expected_improve = expected_improve_rate * stepfrac
ratio = actual_improve / expected_improve
 if ratio.item() > accept_ratio and actual_improve.item() > 0: