ikostrikov / pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
MIT License
433 stars 91 forks source link

doc? #9

Closed hughperkins closed 6 years ago

hughperkins commented 6 years ago

like eg, imagine I have my own policy, that takes in a state, and outputs an action, or perhaps a distribution over actions; and I have a world that takes an action, and returns a reward and a new state, how would I plug these into this TRPO implementation?

ikostrikov commented 6 years ago

This repo is a little bit messy.

Do you want to use TRPO specifically?

I highly recommend to use PPO instead: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

and the repo is much cleaner (and I remember it better).

hughperkins commented 6 years ago

Awesome. Good info. Thanks! :)

hughperkins commented 6 years ago

(Note: perhaps you might consider adding a link to the newer repo to the readme of this repo; I guess that for each person who leaves an Issue, there might be 20 who just walk on by, and never find out about the newer repo)

ikostrikov commented 6 years ago

Yes, that's a good idea! I will add a link.