Closed nottombrown closed 6 years ago
Rollouts were using multiple workers, and each one had the same default seed for both the environment and for the pseudorandom random module. This was leading to pretraining comparisons of things that were basically identical.
random
array([[-1.53344645], [-2.0461825 ], [-2.33774915], [ 0.93797754], [-2.17090229], [-1.82050583], [-0.78764898], [ 1.92595938], [-2.41739235], [ 2.02766944], [-2.42340955], [ 2.85875679], [-0.18809279], [ 2.86056653], [ 0.62907312]]) ipdb> pretrain_segments[2]['actions'] array([[-1.53344645], [-2.0461825 ], [-2.33774915], [ 0.93797754], [-2.17090229], [-1.82050583], [-0.78764898], [ 1.92595938], [-2.41739235], [ 2.02766944], [-2.42340955], [ 2.85875679], [-0.18809279], [ 2.86056653], [ 0.62907312]])``` ## After ```ipdb> pretrain_segments[2]['actions'] array([[-1.53344645], [-2.0461825 ], [-2.33774915], [ 0.93797754], [-2.17090229], [-1.82050583], [-0.78764898], [ 1.92595938], [-2.41739235], [ 2.02766944], [-2.42340955], [ 2.85875679], [-0.18809279], [ 2.86056653], [ 0.62907312]]) ipdb> pretrain_segments[12]['actions'] array([[ 1.62348449], [-2.11832013], [-2.5228675 ], [-2.46238179], [ 1.03228684], [-1.52779674], [-0.4767632 ], [ 0.34421275], [ 2.16330704], [ 1.36226558], [-1.37803257], [-2.2111032 ], [-2.66775408], [-1.19040819], [-1.4272911 ]])```
Improved performance on reacher
Rollouts were using multiple workers, and each one had the same default seed for both the environment and for the pseudorandom
random
module. This was leading to pretraining comparisons of things that were basically identical.Before