openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
10.13k stars 2.22k forks source link

The random seed doesn't work #33

Open xffxff opened 5 years ago

xffxff commented 5 years ago

Even if I set the same random seed, the result is different, and you can test it on ddpg. I think tf.set_random_seed(seed) doesn't work, but I don't know how to solve it.

machinaut commented 5 years ago

Can you give an example with what you expected to happen and what actually happened?

If tensorflow is failing to set a set, maybe you should file a bug on tensorflow.

For gym environments and numpy (for example) are seeded differently, and you should not expect seeding tensorflow to affect either of them.

jachiam commented 5 years ago

Ooh, I may not have seeded Gym envs. My bad. Will look into getting this working---it's possible that Gym isn't the only other source of nondeterminism (they can be hard to track down).

xffxff commented 5 years ago

@machinaut @jachiam ,I run the common python ddpg.py -s 2, the first time I got AverageTestEpRet=-594 of epoch one,but the second time I got the AverageTestRet=-314 of epoch one. And I tried to set

env.seed(seed)
test_env.seed(seed)

but the AverageTestEpRet of epoch one was still different.

jachiam commented 5 years ago

@XFFXFF, hmm, that's a bit unfortunate. I appreciate that you tried this out.

I am not fully sure what could be going wrong here. My suspicion is that it might involve the Python hash seed used to prevent dict collision attacks. See here for an explanation of the issue, and see here for more info. Can you try export PYTHONHASHSEED=0 and then try running your experiment again?

jachiam commented 5 years ago

@XFFXFF I tried out export PYTHONHASHSEED=0, in addition to setting env seed and test_env seed, and did two runs of DDPG with python -m spinup.run ddpg --hid [32] --env HalfCheetah-v2 --steps_per_epoch 1000.

Results from first run:

---------------------------------------
|             Epoch |               1 |
|      AverageEpRet |            -260 |
|          StdEpRet |               0 |
|          MaxEpRet |            -260 |
|          MinEpRet |            -260 |
|  AverageTestEpRet |            -533 |
|      StdTestEpRet |            3.39 |
|      MaxTestEpRet |            -525 |
|      MinTestEpRet |            -536 |
|             EpLen |           1e+03 |
|         TestEpLen |           1e+03 |
| TotalEnvInteracts |           1e+03 |
|      AverageQVals |           0.508 |
|          StdQVals |            1.17 |
|          MaxQVals |            5.88 |
|          MinQVals |           -6.88 |
|            LossPi |           -1.39 |
|             LossQ |           0.964 |
|              Time |            6.86 |
---------------------------------------

Results from second run:

---------------------------------------
|             Epoch |               1 |
|      AverageEpRet |            -260 |
|          StdEpRet |               0 |
|          MaxEpRet |            -260 |
|          MinEpRet |            -260 |
|  AverageTestEpRet |            -533 |
|      StdTestEpRet |            4.89 |
|      MaxTestEpRet |            -526 |
|      MinTestEpRet |            -541 |
|             EpLen |           1e+03 |
|         TestEpLen |           1e+03 |
| TotalEnvInteracts |           1e+03 |
|      AverageQVals |           0.508 |
|          StdQVals |            1.17 |
|          MaxQVals |            5.88 |
|          MinQVals |           -6.88 |
|            LossPi |           -1.39 |
|             LossQ |           0.964 |
|              Time |              16 |
---------------------------------------

Looks like this solves the issue. I'm going to mark this as closed.

jachiam commented 5 years ago

Scratch that, double-checking and it looks like things diverge after Epoch 1. Don't know where this nondeterminism is coming from.

jachiam commented 5 years ago

With env seed setting and export PYTHONHASHSEED=0, TRPO/PPO/VPG are deterministic through at least the first three epochs. I have no idea why DDPG would be different.

Skalwalker commented 4 years ago

Is this issue happening in both tensorflow and pytorch? Have the operation-level seeds been properly set?

Jianengzhang commented 3 years ago

With following env seed setting

    env.seed(seed)
    env.action_space.seed(seed)
    test_env.seed(seed)
    test_env.action_space.seed(seed)

the result is same in same seed.