Open xffxff opened 5 years ago
Can you give an example with what you expected to happen and what actually happened?
If tensorflow is failing to set a set, maybe you should file a bug on tensorflow.
For gym environments and numpy (for example) are seeded differently, and you should not expect seeding tensorflow to affect either of them.
Ooh, I may not have seeded Gym envs. My bad. Will look into getting this working---it's possible that Gym isn't the only other source of nondeterminism (they can be hard to track down).
@machinaut @jachiam ,I run the common python ddpg.py -s 2
, the first time I got AverageTestEpRet=-594 of epoch one,but the second time I got the AverageTestRet=-314 of epoch one. And I tried to set
env.seed(seed)
test_env.seed(seed)
but the AverageTestEpRet of epoch one was still different.
@XFFXFF, hmm, that's a bit unfortunate. I appreciate that you tried this out.
I am not fully sure what could be going wrong here. My suspicion is that it might involve the Python hash seed used to prevent dict collision attacks. See here for an explanation of the issue, and see here for more info. Can you try export PYTHONHASHSEED=0
and then try running your experiment again?
@XFFXFF I tried out export PYTHONHASHSEED=0
, in addition to setting env seed and test_env seed, and did two runs of DDPG with python -m spinup.run ddpg --hid [32] --env HalfCheetah-v2 --steps_per_epoch 1000
.
Results from first run:
---------------------------------------
| Epoch | 1 |
| AverageEpRet | -260 |
| StdEpRet | 0 |
| MaxEpRet | -260 |
| MinEpRet | -260 |
| AverageTestEpRet | -533 |
| StdTestEpRet | 3.39 |
| MaxTestEpRet | -525 |
| MinTestEpRet | -536 |
| EpLen | 1e+03 |
| TestEpLen | 1e+03 |
| TotalEnvInteracts | 1e+03 |
| AverageQVals | 0.508 |
| StdQVals | 1.17 |
| MaxQVals | 5.88 |
| MinQVals | -6.88 |
| LossPi | -1.39 |
| LossQ | 0.964 |
| Time | 6.86 |
---------------------------------------
Results from second run:
---------------------------------------
| Epoch | 1 |
| AverageEpRet | -260 |
| StdEpRet | 0 |
| MaxEpRet | -260 |
| MinEpRet | -260 |
| AverageTestEpRet | -533 |
| StdTestEpRet | 4.89 |
| MaxTestEpRet | -526 |
| MinTestEpRet | -541 |
| EpLen | 1e+03 |
| TestEpLen | 1e+03 |
| TotalEnvInteracts | 1e+03 |
| AverageQVals | 0.508 |
| StdQVals | 1.17 |
| MaxQVals | 5.88 |
| MinQVals | -6.88 |
| LossPi | -1.39 |
| LossQ | 0.964 |
| Time | 16 |
---------------------------------------
Looks like this solves the issue. I'm going to mark this as closed.
Scratch that, double-checking and it looks like things diverge after Epoch 1. Don't know where this nondeterminism is coming from.
With env seed setting and export PYTHONHASHSEED=0
, TRPO/PPO/VPG are deterministic through at least the first three epochs. I have no idea why DDPG would be different.
Is this issue happening in both tensorflow and pytorch? Have the operation-level seeds been properly set?
With following env seed setting
env.seed(seed)
env.action_space.seed(seed)
test_env.seed(seed)
test_env.action_space.seed(seed)
the result is same in same seed.
Even if I set the same random seed, the result is different, and you can test it on ddpg. I think
tf.set_random_seed(seed)
doesn't work, but I don't know how to solve it.