The current implementation uses random.random() which I believe is uniform distribution between [0,1). This can negatively affect exploration abilities of DDPG agent, since noise will have positive bias.
mean, std, min, max of OUNoise before fix:
0.6662002074296958 0.10970679264023238 0.05178335005859267 1.111826336326043
mean, std, min, max of OUNoise after fix:
0.002004800976725908 0.3797033350628932 -1.758558674034922 1.758029080992971
OUNoise should use normal distribution.
The current implementation uses
random.random()
which I believe is uniform distribution between [0,1). This can negatively affect exploration abilities of DDPG agent, since noise will have positive bias.mean, std, min, max
of OUNoise before fix:0.6662002074296958 0.10970679264023238 0.05178335005859267 1.111826336326043
mean, std, min, max
of OUNoise after fix:0.002004800976725908 0.3797033350628932 -1.758558674034922 1.758029080992971
[issue: #20]