thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.84k stars 1.12k forks source link

Async test_collector seems to cause exception in offpolicy_trainer #700

Closed CWHer closed 2 years ago

CWHer commented 2 years ago

I am not sure whether I'm making some mistakes in my code. When I modified atari rainbow dqn code to apply AsyncCollector in test_collector, it raised the following exception

Traceback (most recent call last):
  File "xxxxxx/code/tianshou/examples/atari/atari_rainbow.py", line 260, in <module>
    test_rainbow(get_args())
  File "xxxxxx/code/tianshou/examples/atari/atari_rainbow.py", line 237, in test_rainbow
    result = offpolicy_trainer(
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/trainer/offpolicy.py", line 133, in offpolicy_trainer
    return OffpolicyTrainer(*args, **kwargs).run()
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/trainer/base.py", line 439, in run
    deque(self, maxlen=0)  # feed the entire iterator into a zero-length deque
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/trainer/base.py", line 314, in __next__
    test_stat, self.stop_fn_flag = self.test_step()
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/trainer/base.py", line 343, in test_step
    test_result = test_episode(
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/trainer/utils.py", line 24, in test_episode
    collector.reset_env()
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/data/collector.py", line 427, in reset_env
    super().reset_env(gym_reset_kwargs)
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/data/collector.py", line 136, in reset_env
    rval = self.env.reset(**gym_reset_kwargs)
  File ".../miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/env/venvs.py", line 198, in reset
    self._assert_id(id)
  File "xxxxxx/miniconda3/envs/pt/lib/python3.9/site-packages/tianshou/env/venvs.py", line 179, in _assert_id
    assert i not in self.waiting_id, \
AssertionError: Cannot interact with environment 0 which is stepping now.

I believe the following code patches and running command can reproduce my issue.

--- atari_wrapper.py    2022-07-19 17:17:09.104589633 +0800
+++ atari_wrapper_modified.py   2022-07-19 16:19:34.221165479 +0800
@@ -12,6 +12,7 @@

 try:
     import envpool
+    raise ImportError()
 except ImportError:
     envpool = None

@@ -297,17 +298,17 @@
         train_envs = ShmemVectorEnv(
             [
                 lambda:
-                wrap_deepmind(task, episode_life=True, clip_rewards=True, **kwargs)
+                wrap_deepmind(task, episode_life=True,
+                              clip_rewards=True, **kwargs)
                 for _ in range(training_num)
-            ]
-        )
+            ], wait_num=1, timeout=0.1)
         test_envs = ShmemVectorEnv(
             [
                 lambda:
-                wrap_deepmind(task, episode_life=False, clip_rewards=False, **kwargs)
+                wrap_deepmind(task, episode_life=False,
+                              clip_rewards=False, **kwargs)
                 for _ in range(test_num)
-            ]
-        )
+            ], wait_num=1, timeout=0.1)
         env.seed(seed)
         train_envs.seed(seed)
         test_envs.seed(seed)
--- atari_rainbow.py    2022-07-19 17:17:09.040587568 +0800
+++ atari_rainbow_modified.py   2022-07-19 16:19:25.216876056 +0800
@@ -9,7 +9,7 @@
 from atari_wrapper import make_atari_env
 from torch.utils.tensorboard import SummaryWriter

-from tianshou.data import Collector, PrioritizedVectorReplayBuffer, VectorReplayBuffer
+from tianshou.data import AsyncCollector, PrioritizedVectorReplayBuffer, VectorReplayBuffer
 from tianshou.policy import RainbowPolicy
 from tianshou.trainer import offpolicy_trainer
 from tianshou.utils import TensorboardLogger, WandbLogger
@@ -137,8 +137,8 @@
             weight_norm=not args.no_weight_norm
         )
     # collector
-    train_collector = Collector(policy, train_envs, buffer, exploration_noise=True)
-    test_collector = Collector(policy, test_envs, exploration_noise=True)
+    train_collector = AsyncCollector(policy, train_envs, buffer, exploration_noise=True)
+    test_collector = AsyncCollector(policy, test_envs, exploration_noise=True)

     # log
     now = datetime.datetime.now().strftime("%y%m%d-%H%M%S")
python3 atari_rainbow.py \
    --task "BreakoutNoFrameskip-v4" \
    --n-step 1 --step-per-epoch 10 \
    --training-num 4 --test-num 4

I also read the source code of tianshou to find out a possible cause: test_episode function (link) is invoked periodically and it executes collector.reset_env() everytime. However, for async venv, id is needed in reset_env() . Besides, there is no such thing as "forced reset all env" for async venv.

If there is no mistake in my code and the above cause exists, I believe there should be a hint in docs to warn users not to use AsyncCollector in test_collector (or maybe add a "forced reset all env" function for async venv).

Trinkle23897 commented 2 years ago

If there is no mistake in my code and the above cause exists, I believe there should be a hint in docs to warn users not to use AsyncCollector in test_collector (or maybe add a "forced reset all env" function for async venv).

Yes, you are right. Would you like to make a PR to update the docs?

CWHer commented 2 years ago

It would be my pleasure and I will make it asap. :smile:

Trinkle23897 commented 2 years ago

~Sorry, I need more time to figure out why. Because I noticed if I apply the same changes to a test script, there's no exception.~

I can successfully reproduce the same error on test script.