Investigate Gym Wrappers and Monitors

rhalbersma / doctrina

Exercises in reinforcement learning

Boost Software License 1.0

3 stars 0 forks source link

Open rhalbersma opened 3 years ago

rhalbersma commented 3 years ago

This allows unintrusive stats collection, which would simplify bandit algorithsm.

rhalbersma commented 3 years ago

Instead, we can also redefine the ensemble of bandits to be a single environment, and store the statistics directly inside it.