thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.75k stars 1.12k forks source link

Regarding the error related to SEED when I train in a homebrew environment #1039

Open iamysy opened 7 months ago

iamysy commented 7 months ago

I have visited the source website I have searched through the issue tracker for duplicates I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
0.5.1 0.29.1 2.1.2+cu121 1.26.3 3.9.0 (default, Nov 15 2020, 14:28:56) 

Sorry to bother you, but I have a few questions for you! I hope you can help me out. 1. I am using a self-built environment, previously I was using version 0.4.7 of tianshou for training, saving the best_model and checkpoint during the training process, during the process the training will be interrupted for some reasons, I load the best_model or checkpoint, the training effect seems to go back to the original point (the value of the reward returns back to the original)

    def save_best_fn(policy):
        torch.save(
                    {
                        'model': policy.state_dict(),
                        # 'optim': optim.state_dict(),
                    }, os.path.join(log_path, 'best_model.pth')
                   )
    def save_checkpoint_fn(epoch, env_step, gradient_step):
        # see also: https://pytorch.org/tutorials/beginner/saving_loading_models.html
        torch.save(
            {
                'model': policy.state_dict(),
                # 'optim': optim.state_dict(),
            }, os.path.join(log_path, 'checkpoint.pth')
        )
        pickle.dump(
            train_collector.buffer,
            open(os.path.join(log_path, 'train_buffer.pkl'), "wb")
        )
 if args.resume:
        # load from existing checkpoint
        print(f"Loading agent under {log_path}")
        ckpt_path = os.path.join(log_path, 'checkpoint.pth')
        if os.path.exists(ckpt_path):
            checkpoint = torch.load(ckpt_path, map_location=args.device)
            policy.load_state_dict(checkpoint['model'])
            # policy.optim.load_state_dict(checkpoint['optim'])
            print("Successfully restore policy and optim.")
        else:
            print("Fail to restore policy and optim.")
        buffer_path = os.path.join(log_path, 'train_buffer.pkl')
        if os.path.exists(buffer_path):
            train_collector.buffer = pickle.load(open(buffer_path, "rb"))
            print("Successfully restore buffer.")
        else:
            print("Fail to restore buffer.")

2. I checked some issues, I added get_obs_rm and updated to the corresponding version of tianshou (0.5.1), during the training process, there is a problem related to seed, I checked the example in the example, I don't know how to modify this part, I hope you can help me out! parser.add_argument('--seed', type=int, default=10)

np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    train_envs.seed([1,2,3,4,5])

Thank you again!

MischaPanch commented 7 months ago

Sorry, I don't understand your question. What exactly do you want to modify and why?

Note that the current status of the master branch is (very) far ahead of the last release on pypi. We plan to release the updated version very soon, just need to finalize the docs and some minor issues

iamysy commented 7 months ago

I apologize for not explaining my question clearly earlier. What I mean is that during the training process, our program gets interrupted due to some external reasons. I used the following method to reload the previously obtained model, but the reward value returned to the starting point, as if the optimal model had not been saved. Is there a problem with the way I saved it? The code I use for saving and loading the model is as follows:

    def save_best_fn(policy):
        torch.save(
                    {
                        'model': policy.state_dict(),
                        # 'optim': optim.state_dict(),
                    }, os.path.join(log_path, 'best_model.pth')
                   )
    def save_checkpoint_fn(epoch, env_step, gradient_step):
        # see also: https://pytorch.org/tutorials/beginner/saving_loading_models.html
        torch.save(
            {
                'model': policy.state_dict(),
                # 'optim': optim.state_dict(),
            }, os.path.join(log_path, 'checkpoint.pth')
        )
        pickle.dump(
            train_collector.buffer,
            open(os.path.join(log_path, 'train_buffer.pkl'), "wb")
        )
 if args.resume:
        # load from existing checkpoint
        print(f"Loading agent under {log_path}")
        ckpt_path = os.path.join(log_path, 'checkpoint.pth')
        if os.path.exists(ckpt_path):
            checkpoint = torch.load(ckpt_path, map_location=args.device)
            policy.load_state_dict(checkpoint['model'])
            # policy.optim.load_state_dict(checkpoint['optim'])
            print("Successfully restore policy and optim.")
        else:
            print("Fail to restore policy and optim.")
        buffer_path = os.path.join(log_path, 'train_buffer.pkl')
        if os.path.exists(buffer_path):
            train_collector.buffer = pickle.load(open(buffer_path, "rb"))
            print("Successfully restore buffer.")
        else:
            print("Fail to restore buffer.")
MischaPanch commented 7 months ago

So it has nothing to do with seeding? I was a bit confused by the issue's title.

I haven't looked much into persistence yet, but @opcode81 has worked with it. Maybe he can also advice you on how to migrate to the newest code version - you could consider using the new high-level interfaces. They would be especially useful if you don't need to tinker with the algorithms and just want to train on a custom environments

iamysy commented 7 months ago

Regarding the seed issue, it is related to my second question in my initial inquiry. In order to ensure persistence, I have checked the Tianshou GitHub repository issues, and I found a similar answer: to save 'get_obs_rm()'.

    def save_best_fn(policy):
        state = {" model": policy.state_dict(), "obs_rms": train_envs.get_obs_rms()}
        torch.save(state, os.path.join(log_path, 'best_model.pth'))

However, using it requires the latest versions of Tianshou and Gymnasium. I have updated my Tianshou version, but I encountered a seed error as shown below.

/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.seed to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.seed` for environment variables or `env.get_wrapper_attr('seed')` that will search the reminding wrappers.
  logger.warn(
/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/utils/passive_env_checker.py:168: DeprecationWarning: WARN: Current gymnasium version requires that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.deprecation(
/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/utils/passive_env_checker.py:181: DeprecationWarning: WARN: Current gymnasium version requires that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.deprecation(
Process Process-1:
Traceback (most recent call last):
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/tianshou/env/worker/subproc.py", line 111, in _worker
    env.reset(seed=data)
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
    return self.env.reset(**kwargs)
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
    return self.env.reset(**kwargs)
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/wrappers/env_checker.py", line 57, in reset
    return env_reset_passive_checker(self.env, **kwargs)
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/gymnasium/utils/passive_env_checker.py", line 186, in env_reset_passive_checker
    result = env.reset(**kwargs)
TypeError: reset() got an unexpected keyword argument 'seed'
Traceback (most recent call last):
  File "/home/dyfluid/drl/10/DRLinFluids_airfoil_sac/launch_multiprocessing_training_airfoil.py", line 404, in <module>
    test_sac_with_il()
  File "/home/dyfluid/drl/10/DRLinFluids_airfoil_sac/launch_multiprocessing_training_airfoil.py", line 229, in test_sac_with_il
    train_envs.seed([1,2,3,4,5])
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/tianshou/env/venvs.py", line 406, in seed
    return [w.seed(s) for w, s in zip(self.workers, seed_list)]
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/tianshou/env/venvs.py", line 406, in <listcomp>
    return [w.seed(s) for w, s in zip(self.workers, seed_list)]
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/site-packages/tianshou/env/worker/subproc.py", line 241, in seed
    return self.parent_remote.recv()
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/dyfluid/anaconda3/envs/tianshou/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

Therefore, I would like to inquire how to resolve this issue or if there are any reference sources available Thank you again. >.<

Trinkle23897 commented 7 months ago

Ooh, try to install an editable version of Tianshou, @MischaPanch changed a lot recently, and 0.5.1 was released 1 year ago.

Alternatively, you can change the reset call in SubprocVecEnv, to remove the seed arg. In the newest version of Gymnasium, they changed the API not compatible with Gym 0.18, one of them is reset(seed=seed) to [init(seed=seed), reset(no seed)]

MischaPanch commented 7 months ago

Fun and slightly related fact: I'm working on a seed-related issue in Gymnasium right now ^^

https://github.com/Farama-Foundation/Gymnasium/pull/889