Closed 0xangelo closed 5 years ago
If base_env
in the callback solution is an already-instantiated env object then the callback method would be a simpler alternative I guess... but if it is just a reference to the env class then maybe the second method might be simpler...
For now, I can think of two ways:
set_current_env
) to set the environment the policy is using. Callbacks are passed through the config (config["callbacks"]["on_episode_start"]
), but we can add then inMAPOTrainer
so that they're added every time (DQN does this). The callback is called with the following arguments:config["env"]
andconfig["env_config"]
. This can be problematic if the environment has some hidden internal state, since in that case the instance used for calculation transitions and the one used for training might behave differently. Nevertheless, we would probably do something similar to whatTrainer
does: