Closed GlennCeusters closed 3 years ago
This sounds more like model-based learning (stable-baselines focuses on model-free systems). There is no support for doing this directly, you need to input the states to agent to predict actions with predict
. For this you will need to execute the actions in the environment to get the states.
Hi,
I'm interested in "rolling out" the learned policy at a given state (e.g. using the transition function). Hence, plotting/printing the sequence of actions that maximized the expected future rewards at a given state/timestep (even though we would only execute the first action). Or in other words, predicting the future actions at a given state - before stepping through the environment (again, even though we would only execute the first action).
Many thanks for your replies!
Cheers, /Glenn