Hey, so people (including myself) are starting to use garage more for IL (and related topics). I have a short list of "must have" features which I think would make that much better. When I proposed them seperately in the past people have seemed a little hesitant on supporting them, so I figured putting them here in one place with a unified goal might make them more understandable.
Add a Trainable class, which RLAlgorithm inherits from, and make Trainer use that interface (this is already 90% done, the Trainer already only requires that interface, but it's confusing and un-obvious).
Allow setting the default X-axis with wrap_experiment, instead of locking it to TotalEnvSteps (pretty necessary for BC, I have a commit for it).
Allow setting total_env_steps on Trainer, and make garage.Sampler have a total_env_steps property, which the Sampler is responsible for counting (I already have a commit for this). This allows much more flexibility to the algorithm about what sampling is for training, and what is for evaluation, etc.
Add a Sampler class for sampling from a PathBuffer. This removes the complicated "source" concept that BC currently has (I also already have a commit for this).
Hey, so people (including myself) are starting to use garage more for IL (and related topics). I have a short list of "must have" features which I think would make that much better. When I proposed them seperately in the past people have seemed a little hesitant on supporting them, so I figured putting them here in one place with a unified goal might make them more understandable.
Trainable
class, whichRLAlgorithm
inherits from, and makeTrainer
use that interface (this is already 90% done, theTrainer
already only requires that interface, but it's confusing and un-obvious).wrap_experiment
, instead of locking it toTotalEnvSteps
(pretty necessary for BC, I have a commit for it).total_env_steps
onTrainer
, and makegarage.Sampler
have atotal_env_steps
property, which the Sampler is responsible for counting (I already have a commit for this). This allows much more flexibility to the algorithm about what sampling is for training, and what is for evaluation, etc.Sampler
class for sampling from aPathBuffer
. This removes the complicated "source" concept that BC currently has (I also already have a commit for this).