Closed Raelifin closed 7 years ago
Oh, I also rearrange the main setup logic for Teacher so that it's more likely to immediately crash if something doesn't work, rather than, say, do rollouts and then crash.
Hmmm, q_state
is the standard name for the (obs, action)
pairs that are used in q-learning
:
https://www.google.com/search?q="q+state"+mdp
However, I think that you're right that keeping the actions and observations explicitly separated until they are fed into the neural net is an improvement. It should make the system more accessible to newcomers.
Have you tested that this PR doesn't cause any performance regressions on Mujoco?
Made some small changes and am running regression tests currently
Doesn't break performance (actually improved things on this seed). Merging in!
I've never seen a "Q-State" outside of Teacher. Q functions take (state,action) pairs. Storing the concatenation of states and actions isn't particularly elegant and only saves us a tiny amount of computation.
Furthermore, concatenating states and actions only really works when they're the same rank. For Atari environments or other non-MuJoCo environments this will not always be the case. We can extend support for more environments by pulling them apart and adding some checks for environment dimensionality.