Open surbhi1944 opened 4 years ago
Please tell me how can we evaluate or simulate the trained model to know its performance?
Thanks
Waiting for the answer. Please reply. This happens with all the models trained with this repository. Training curves are somewhat closer to that of research paper but testing on seed on which training was done are too less.
Waiting for the answer. Please reply. This happens with all the models trained with this repository. Training curves are somewhat closer to that of research paper but testing on seed on which training was done are too less.
Hi, have you figured out the reason?
No
Sent from Outlook Mobilehttps://aka.ms/blhgte
From: tmu notifications@github.com Sent: Tuesday, May 26, 2020 11:42:41 AM To: openai/baselines baselines@noreply.github.com Cc: SURBHI GUPTA SG1944@bennett.edu.in; Comment comment@noreply.github.com Subject: Re: [openai/baselines] Trained model not working (#1054)
Waiting for the answer. Please reply. This happens with all the models trained with this repository. Training curves are somewhat closer to that of research paper but testing on seed on which training was done are too less.
Hi, have you figured out the reason?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/openai/baselines/issues/1054#issuecomment-633827037, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMH64BNDTZYPUFFBFGEUYIDRTNMVTANCNFSM4JZJ3SIQ.
I had some problems with the parser and it took me one hour to find out that the condition if load_path is not None
in ppo2.py was not satisfied. Hope you did not make the same mistake as I did. :(
I have trained the PPO2 model on Walker2d-v2 environment with following command with nminibatches=64
python -m baselines.run --alg=ppo2 --env=Walker2d-v2 --num_timesteps=1e6 --seed=30 --network=mlp --num_env=1 --save_path="/home/surabhi/Downloads/github/baselines/result/walker2d/30/ppo2" --log_path="/home/surabhi/Downloads/github/baselines/result/walker2d/30/"
But when i run the trained model it is showing return of ~9. python -m baselines.run --alg=ppo2 --env=Walker2d-v2 --num_timesteps=0 --seed=30 --network=mlp --num_env=1 --load_path="/home/surabhi/Downloads/github/baselines/result/walker2d/30/ppo2" --play --save_video_interval=1 save_video_length=1000
if args.play: logger.log("Running trained model") roll_rew = [0.0] state = model.initial_state if hasattr(model, 'initial_state') else None for roll in range(10): #number of rollout obs = env.reset() dones = np.zeros((1,)) episoderew = [0.0] eplen=0 for ts in range(1000): #max rollout length (path len) if state is not None: actions, , state, = model.step(obs,S=state, M=dones) else: actions, , , = model.step(obs)
obs, rew, done, _ = env.step(actions) print(rew[0]) episode_rew[-1] += rew[0] eplen+=1 #env.render() done_any = done.any() if isinstance(done, np.ndarray) else done if done_any or eplen>=1000: #maxlen of rollout #for i in np.nonzero(done)[0]: # print('episode_rew={}'.format(episode_rew)) #episode_rew[i] = 0 obs = env.reset() episode_rew.append(0.0) eplen=0 #print("#mean100ep_reward",episode_rew[-100:]) print(round(np.mean(episode_rew[-100:]),1),len(episode_rew)) roll_rew.append(round(np.mean(episode_rew[-100:]),1)) print("rollout avg ",sum(roll_rew)//10)