nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
556 stars 93 forks source link

No more Q states #30

Closed Raelifin closed 6 years ago

Raelifin commented 6 years ago

I've never seen a "Q-State" outside of Teacher. Q functions take (state,action) pairs. Storing the concatenation of states and actions isn't particularly elegant and only saves us a tiny amount of computation.

Furthermore, concatenating states and actions only really works when they're the same rank. For Atari environments or other non-MuJoCo environments this will not always be the case. We can extend support for more environments by pulling them apart and adding some checks for environment dimensionality.

Raelifin commented 6 years ago

Oh, I also rearrange the main setup logic for Teacher so that it's more likely to immediately crash if something doesn't work, rather than, say, do rollouts and then crash.

nottombrown commented 6 years ago

Hmmm, q_state is the standard name for the (obs, action) pairs that are used in q-learning:

https://www.google.com/search?q="q+state"+mdp

However, I think that you're right that keeping the actions and observations explicitly separated until they are fed into the neural net is an improvement. It should make the system more accessible to newcomers.

Have you tested that this PR doesn't cause any performance regressions on Mujoco?

nottombrown commented 6 years ago

Made some small changes and am running regression tests currently

nottombrown commented 6 years ago

image

Doesn't break performance (actually improved things on this seed). Merging in!