Closed user-1701 closed 2 years ago
I'll start by recommending moving to stable-baselines3, if possible, as that is more actively maintained and your modification is probably easier there.
However, the same limitation of callbacks is there as well. Depending on your situation, you could either add an environment wrapper that takes actions from the others models and combine them. However, if all networks need to be learning at the same time etc, you will have to modify the learn/rollout functions to achieve the behaviour you want.
Thanks a lot Miffyli, I will try the wrapping!
My environment uses more than 1 network create the actions. While one model is learning, I need to reroute the observation to other models to predict an action and to concatenate all actions before the step is finished.
Therefor I think I have to split up the
.learn()
step.The
on_step
Callback Event is only called after.step
if I am not mistaken, so this doesn't really help. Is there another way to do it?