rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 309 forks source link

Make the sampler field protected in algorithms #2198

Closed yeukfu closed 3 years ago

yeukfu commented 3 years ago

This is a patch PR to https://github.com/rlworkgroup/garage/pull/2194.

codecov[bot] commented 3 years ago

Codecov Report

Merging #2198 (80c71cc) into master (5aecf5a) will decrease coverage by 0.01%. The diff coverage is 90.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2198      +/-   ##
==========================================
- Coverage   91.21%   91.19%   -0.02%     
==========================================
  Files         201      201              
  Lines       10970    10971       +1     
  Branches     1370     1371       +1     
==========================================
- Hits        10006    10005       -1     
- Misses        701      704       +3     
+ Partials      263      262       -1     
Impacted Files Coverage Δ
src/garage/torch/algos/bc.py 89.61% <50.00%> (ø)
src/garage/trainer.py 92.69% <66.66%> (+0.03%) :arrow_up:
src/garage/np/algos/cem.py 100.00% <100.00%> (ø)
src/garage/np/algos/cma_es.py 100.00% <100.00%> (ø)
src/garage/tf/algos/ddpg.py 96.98% <100.00%> (ø)
src/garage/tf/algos/dqn.py 91.37% <100.00%> (ø)
src/garage/tf/algos/npo.py 96.44% <100.00%> (ø)
src/garage/tf/algos/reps.py 98.46% <100.00%> (ø)
src/garage/tf/algos/rl2.py 94.73% <100.00%> (ø)
src/garage/tf/algos/td3.py 96.62% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5aecf5a...80c71cc. Read the comment docs.

yeukfu commented 3 years ago

@krzentner One problem of constructing the default sampler in the algorithm is that the algorithm doesn't have the env instances, which is required to construct samplers. So the default sampler can only be constructed in the trainer, which is our old pattern.

krzentner commented 3 years ago

Honestly, we should probably be passing the environment into the algorithm anyways. As a shorter term solution, we could construct the sampler in train by calling trainer.get_env_copy().