Open vvanirudh opened 6 years ago
I remember reading a discussion about Caffe having a similar issue, when using CUDA.
How do you ensure that runs
Hi @vvanirudh !
Can you please explain first why you need to exactly reproduce seeds for DDPG?
Assuming you have reasons, that is what you can do:
test if your environment is deterministic if you feed the same (maybe random) actions,
check if DDPG environment interaction produces the same actions,
there should not be much random things in DDPG except batch shuffle, so maybe check if first batch contents doesn't change given the same data from interaction with environment.
I am comparing two algorithms on the same environment (one of them being DDPG). Just to check if they are initially encountering the same environments, I set the seed to be exactly the same. I noticed that several runs of the same algorithm (say, DDPG) on the same environment with the same seed had starkly different training curves.
I have tried what you have suggested:
I have noticed that if you run the DDPG code (with default parameters and environment) with the same seed twice, I get different actor and critic losses across epochs. As a matter of fact, almost all the metrics that we log are different even though we use the same seed (and the same exact parameters).
How do you ensure that runs are replicable with same seed?