Closed sanjeevanahilan closed 4 years ago
Hi,
Thanks for the interest! Not sure exactly what the issue is, but one thing you might try is to instantiate a numpy RandomState object (with the given seed) in the environment when calling env.seed
and use that in place of np.random
in the env code. There might be something weird going on with setting the global random seed in each environment since they're running in parallel in subprocesses.
The reward shaping was used for the paper. From what I can remember, it may have worked a little without reward shaping but not nearly as well.
@shariqiqbal2810 Thanks a lot for your quick response.
Nice idea to fix the randomness of the environment - I just tried a quick hack by simply eliminating any randomness in the 'simple.py' environment but unfortunately am still getting slight differences in the reward curves. I'm a bit time constrained at the moment, so may have to investigate this later. Are you able to confirm that you don't run into this issue on your machine or have you not had a chance to try yet?
And thanks for the info on reward shaping. I suppose solving the environment without it would be a nice target to aim for. :)
Hmm in this case I'm not sure what the problem is, especially since you have enabled deterministic mode for cudnn. My compute resources are being 100% utilized right now, so I'm unable to verify whether I also have this issue on my machine, unfortunately. Please let me know if you figure anything out!
I understand, will do!
Just in case anyone is curious regarding my other question, MAAC does indeed solve collect treasure when shaping is set to False. I tried it on a version of the problem with 5 agents (stopped early).
Just in case anyone is curious regarding my other question, MAAC does indeed solve collect treasure when shaping is set to False. I tried it on a version of the problem with 5 agents (stopped early).
Ah, good to know!
I am a bit mystified as to what changed, but fixed random seeds are now giving the desired fixed results, even in random environments, so will take the win!
If someone else runs into this issue perhaps try:
torch.manual_seed(run_num)
torch.cuda.manual_seed(run_num)
torch.cuda.manual_seed_all(run_num) # if you are using multi-GPU.
np.random.seed(run_num) # Numpy module.
random.seed(run_num) # Python random module.
torch.manual_seed(run_num)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
Great! Perhaps it's a package version thing?
It's possible but I have no idea to be honest!
Hi, thanks for this great code. I've been using it for some experiments and have been having some issues with replicability. One thing I notice is that learning curves are different even when I try and set the same random seed. I still get different results even if I do:
https://github.com/pytorch/pytorch/issues/7068
Have you noticed this issue and is there a way to resolve it?
Also, a quick unrelated question: for Treasure Collection I notice that by default you use substantial reward shaping. Was this shaping used for the Actor-Attention-Critic paper? Were you able to successfully train without it?