hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

[question] How to rollout the learned policy? #1098

Closed GlennCeusters closed 3 years ago

GlennCeusters commented 3 years ago

Hi,

I'm interested in "rolling out" the learned policy at a given state (e.g. using the transition function). Hence, plotting/printing the sequence of actions that maximized the expected future rewards at a given state/timestep (even though we would only execute the first action). Or in other words, predicting the future actions at a given state - before stepping through the environment (again, even though we would only execute the first action).

Many thanks for your replies!

Cheers, /Glenn

Miffyli commented 3 years ago

This sounds more like model-based learning (stable-baselines focuses on model-free systems). There is no support for doing this directly, you need to input the states to agent to predict actions with predict. For this you will need to execute the actions in the environment to get the states.