Closed FlorianKlemt closed 5 years ago
Hi Florian!
That's awesome! We have been struggling with non-deterministic behavior from cuda for so long...I'm very grad that you have fixed the issue.
I think you also should create an issue in PyTorch repository since it seems to be a more general problem.
Best regards, Ilya
Hi everyone,
I tried to get reproducible results (meaning the same sequence of actions, observations and rewards) on different Atari-Environments.
When running on CPU the results are reproducible, however when using CUDA, neither the actions chosen by the agent nor the observations/rewards are deterministic.
Interestingly when using only 1 worker, the results are reproducible even on CUDA.
When using:
the actions chosen by the agent become reproducible on CUDA. (be careful when using these flags, they can and will greatly impact runtime)
What is weird is that the observations returned by the environment are still non-deterministic, even though the actions are deterministic.
After long search it seems like the issue is in envs.py in the class VecPyTorchFrameStack in the reset method.
The inplace-zero seems to make results non-deterministic, when using multiple workers on CUDA. I am not sure how this is possible, but when replacing the line with:
all results (action, observations, rewards) become completely reproducible in CUDA with multiple workers.
It would be great if you could check whether this fix makes results reproducible for you too. If yes i would propose to change the inplace-zero line. Without the cudnn-backend-determinism flags (which maybe you could add optionally via a determinism argument) it wont lead to determinism, but it might save the next programmer a lot of time.
Best regards, Florian Klemt
Here is my test file, which highlights the problem. To switch between deterministic and non-deterministic, replace the indicated lines in the reset method of VecPyTorchFrameStack.