hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Accessing observations during training aka .learn() #1159

Closed user-1701 closed 2 years ago

user-1701 commented 2 years ago

My environment uses more than 1 network create the actions. While one model is learning, I need to reroute the observation to other models to predict an action and to concatenate all actions before the step is finished.

Therefor I think I have to split up the .learn() step.

The on_step Callback Event is only called after .step if I am not mistaken, so this doesn't really help. Is there another way to do it?

Miffyli commented 2 years ago

I'll start by recommending moving to stable-baselines3, if possible, as that is more actively maintained and your modification is probably easier there.

However, the same limitation of callbacks is there as well. Depending on your situation, you could either add an environment wrapper that takes actions from the others models and combine them. However, if all networks need to be learning at the same time etc, you will have to modify the learn/rollout functions to achieve the behaviour you want.

user-1701 commented 2 years ago

Thanks a lot Miffyli, I will try the wrapping!