shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
645 stars 169 forks source link

Seeding fails to produce deterministic results #21

Closed sanjeevanahilan closed 4 years ago

sanjeevanahilan commented 4 years ago

Hi, thanks for this great code. I've been using it for some experiments and have been having some issues with replicability. One thing I notice is that learning curves are different even when I try and set the same random seed. I still get different results even if I do:

    torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
    np.random.seed(seed)  # Numpy module.
    random.seed(seed)  # Python random module.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

Have you noticed this issue and is there a way to resolve it?

Also, a quick unrelated question: for Treasure Collection I notice that by default you use substantial reward shaping. Was this shaping used for the Actor-Attention-Critic paper? Were you able to successfully train without it?

shariqiqbal2810 commented 4 years ago


Thanks for the interest! Not sure exactly what the issue is, but one thing you might try is to instantiate a numpy RandomState object (with the given seed) in the environment when calling env.seed and use that in place of np.random in the env code. There might be something weird going on with setting the global random seed in each environment since they're running in parallel in subprocesses.

The reward shaping was used for the paper. From what I can remember, it may have worked a little without reward shaping but not nearly as well.

sanjeevanahilan commented 4 years ago

@shariqiqbal2810 Thanks a lot for your quick response.

Nice idea to fix the randomness of the environment - I just tried a quick hack by simply eliminating any randomness in the '' environment but unfortunately am still getting slight differences in the reward curves. I'm a bit time constrained at the moment, so may have to investigate this later. Are you able to confirm that you don't run into this issue on your machine or have you not had a chance to try yet?

And thanks for the info on reward shaping. I suppose solving the environment without it would be a nice target to aim for. :)

shariqiqbal2810 commented 4 years ago

Hmm in this case I'm not sure what the problem is, especially since you have enabled deterministic mode for cudnn. My compute resources are being 100% utilized right now, so I'm unable to verify whether I also have this issue on my machine, unfortunately. Please let me know if you figure anything out!

sanjeevanahilan commented 4 years ago

I understand, will do!

sanjeevanahilan commented 4 years ago

Just in case anyone is curious regarding my other question, MAAC does indeed solve collect treasure when shaping is set to False. I tried it on a version of the problem with 5 agents (stopped early).


shariqiqbal2810 commented 4 years ago

Just in case anyone is curious regarding my other question, MAAC does indeed solve collect treasure when shaping is set to False. I tried it on a version of the problem with 5 agents (stopped early).

Ah, good to know!

sanjeevanahilan commented 4 years ago

I am a bit mystified as to what changed, but fixed random seeds are now giving the desired fixed results, even in random environments, so will take the win! random_seed_working

If someone else runs into this issue perhaps try:

    torch.cuda.manual_seed_all(run_num)  # if you are using multi-GPU.
    np.random.seed(run_num)  # Numpy module.
    random.seed(run_num)  # Python random module.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
shariqiqbal2810 commented 4 years ago

Great! Perhaps it's a package version thing?

sanjeevanahilan commented 4 years ago

It's possible but I have no idea to be honest!