pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.2k stars 157 forks source link

[BUG] lazy execution issue with lambda #156

Closed zhouzypaul closed 3 years ago

zhouzypaul commented 3 years ago

https://github.com/pfnet/pfrl/blob/7b0c7e938ba2c0c56a941c766c68635d0dad43c8/examples/atari/train_ppo_ale.py#L199-L200

On the line above, there is a lazy execution issue with lambda

when called like this with list comprehension, the expression (lambda: make_env(idx, test)) is never actually evaluated until it's been called. When it is being called in pfrl.envs.MultiprocessVectorEnv, the for loop has looped through and idx == args.num_envs for all the lambdas in the list

As a result, all the envs in the MultiprocessVectorEnv are actually seeded by the same seed.

zhouzypaul commented 3 years ago

other scripts in examples/ don't suffer from this issue because they use functools.partial

muupan commented 3 years ago

Oh, this is definitely a bug that affects performance. Thanks for pointing it out! I can reproduce:

diff --git a/examples/atari/train_ppo_ale.py b/examples/atari/train_ppo_ale.py
index aa6e107..11d8a17 100644
--- a/examples/atari/train_ppo_ale.py
+++ b/examples/atari/train_ppo_ale.py
@@ -174,6 +174,7 @@ def main():
     print("Output files are saved in {}".format(args.outdir))

     def make_env(idx, test):
+        print(f"make_env called with idx: {idx}")
         # Use different random seeds for train and test envs
         process_seed = int(process_seeds[idx])
         env_seed = 2 ** 32 - 1 - process_seed if test else process_seed
python examples/atari/train_ppo_ale.py --gpu -1
Output files are saved in results/13eb97e0e87096517935edba7673b448b2bec64f-2da9143a-67bc6343
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
Observation space Box(4, 84, 84)
Action space Discrete(4)
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
make_env called with idx: 7
muupan commented 3 years ago

resolved by https://github.com/pfnet/pfrl/pull/157