thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.6k stars 1.1k forks source link

Use the trained model with tianshou API #859

Closed zichunxx closed 1 year ago

zichunxx commented 1 year ago

Hi! Thanks to the team for the excellent work!

I am new to RL and tianshou.

Referring to relevant examples, I get trained models. However, I am not familiar with how to render Mujoco with the trained model (.pth) through the necessary tianshou API.

Is it possible to provide some examples to dive deeper?

Help would be appreciated!

Trinkle23897 commented 1 year ago

what's your training command?

zichunxx commented 1 year ago

I'm working with fetch_her_ddpg.py and the trained model (policy.pth) is obained.

First, I try to explore whether there is a minimal cost to initialize a DDPGPolicy and load the trained model. Currently, the snippet shown in the example can be implemented but I think it's a little bit redundant when I just load a trained model.

env = make_fetch_env(args.task)

args.state_shape = {
    'observation': env.observation_space['observation'].shape,
    'achieved_goal': env.observation_space['achieved_goal'].shape,
    'desired_goal': env.observation_space['desired_goal'].shape,
}

args.action_shape = env.action_space.shape or env.action_space.n
args.max_action = env.action_space.high[0]
args.exploration_noise = args.exploration_noise * args.max_action

# model
dict_state_dec, flat_state_shape = get_dict_state_decorator(
    state_shape=args.state_shape,
    keys=['observation', 'achieved_goal', 'desired_goal']
)

net_a = dict_state_dec(Net)(flat_state_shape, hidden_sizes=args.hidden_sizes, device=args.device)
actor = dict_state_dec(Actor)(net_a, args.action_shape, max_action=args.max_action, device=args.device).to(args.device)
actor_optim = torch.optim.Adam(actor.parameters(), lr=args.actor_lr)

net_c = dict_state_dec(Net)(
    flat_state_shape,
    action_shape=args.action_shape,
    hidden_sizes=args.hidden_sizes,
    concat=True,
    device=args.device,
)
critic = dict_state_dec(Critic)(net_c, device=args.device).to(args.device)
critic_optim = torch.optim.Adam(critic.parameters(), lr=args.critic_lr)

policy = DDPGPolicy(
    actor,
    actor_optim,
    critic,
    critic_optim,
    tau=args.tau,
    gamma=args.gamma,
    exploration_noise=GaussianNoise(sigma=args.exploration_noise),
    estimation_step=args.n_step,
    action_space=env.action_space,
)

# load a trained policy
args.resume_path = './log/FetchReach-v3/ddpg/0/pc/policy.pth'
if args.resume_path:
    policy.load_state_dict(torch.load(args.resume_path, map_location="cuda"))

Second, I want to use the trained policy to interact with the env and observe the whole process in a Mujoco window step by step, here is my pseudo code. However, since I'm not familiar with the tianshou API, I'm trying to figure out what the type of Batch to pass to policy.foward(). Perhaps some APIs already enable this capability.

env = TruncatedAsTerminated(gym.make("FetchReach-v3", render_mode="human"))
observation, info = env.reset()
for _ in range(1000):
   env.render()
   action = policy(observation)  # the trained DDPG_policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()
env.close()

The above are the main issues I am encountering and I am trying to read the official documentation and examples to find more information on how to solve them. Maybe my question is dumb, but thanks for taking the time to read this.

Trinkle23897 commented 1 year ago

python fetch_her_ddpg.py --watch --resume-path="./log/FetchReach-v3/ddpg/0/pc/policy.pth" --render=0.03

zichunxx commented 1 year ago

Thanks

lzl60109 commented 10 months ago

Hello, have you tried training and rendering in the hand_dapg environment? I also encountered the same problem as you, but I did not successfully render using the above scripts