rhalbersma / doctrina

Exercises in reinforcement learning
Boost Software License 1.0
3 stars 0 forks source link

Investigate Gym Wrappers and Monitors #2

Open rhalbersma opened 3 years ago

rhalbersma commented 3 years ago

This allows unintrusive stats collection, which would simplify bandit algorithsm.

rhalbersma commented 3 years ago

Instead, we can also redefine the ensemble of bandits to be a single environment, and store the statistics directly inside it.