[Bug Report] Possible resolution for random seed setting and non-deterministic training

hojae-io commented 2 weeks ago

Describe the bug

It has been observed that the training results, such as the reward curve, are not the same even if you manually set a random seed (for instance, seed=42).

Several similar issues have been submitted: https://github.com/isaac-sim/IsaacLab/issues/489 https://github.com/isaac-sim/IsaacLab/issues/275

Steps to reproduce

Try running some code like : IsaacLab/source/standalone/workflows/rsl_rl/train.py Or check the above issue to reproduce the problem.

System Info

Describe the characteristic of your environment:

Commit: [e.g. 8f3b9ca]
Isaac Sim Version: 2024.4.1
OS: Ubuntu 22.04
GPU: Geforce 3090
CUDA: 11.2
GPU Driver: 550

Resolution

Here's my resolution, and now I don't have any non-deterministic / stochastic behavior, and the reward curve "exactly" overlap if I train the same code multiple times.

The problem is coming from "setting the seed after the environment is created"

You can see the seed is being set at line 118 which is after the env is created at line 90

But now, if you set the seed before the env is created (like line 92 - 102 in the image below), all the behavior becomes deterministic. I don't know why "the time at which you set the seed" is important and it "could" cause non-deterministic behavior.

It would be nice if you could provide explanation and reflect this bug into your next pull request.

Checklist

[ ] I have checked that there is no similar issue in the repo (required)
[x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

[ ] The determinacy issue is resolved (potentially with the above fix)
[ ] There are tests that ensure the determinacy

pascal-roth commented 2 weeks ago

Hi, Thanks for looking into this issue. I agree; it is strange that the moment when the seed is set makes a difference when the seed is set. Could you make a PR with the necessary changes and, ideally, a test to prove that the fix is effective?

Mayankm96 commented 2 weeks ago

Thanks @hojae-io for digging deeper into this issue. This is a really great find and I may have an explanation.

From what I suspect, at initialization, we have some random events happening. For instance, the terrain generation (a lot of random sampling there), initialization of PhysX solver and internal buffers (not sure). It would make sense that setting the seed before ensures that the randomness from these sources are limited when the seed is fixed.

Definitely makes sense to fix this issue. Instead of modifying the play/train scripts, we should set the seed as the first operation when the class is constructed. We can add seed into the configuration of the environment.

amrmousa144 commented 1 week ago

I'm having the same issue and looking forward to a contribution to fix it ASAP!

Mayankm96 commented 1 week ago

Based on the suggestion here, I have made the fixes in #940. At least from the unit test, where I do some fixed number of env steps, the obtained obs and rewards are the same. Though this checks this within the same process.

I ran the training for anymal locomotion and it looks promising:

./isaaclab.sh -p source/standalone/workflows/rsl_rl/train.py --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --run_name seed_fix

Results over three runs (top: current main, bottom: after the fix)

isaac-sim / IsaacLab