openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.63k stars 4.86k forks source link

Trained model not working #1054

Open surbhi1944 opened 4 years ago

surbhi1944 commented 4 years ago

I have trained the PPO2 model on Walker2d-v2 environment with following command with nminibatches=64

python -m baselines.run --alg=ppo2 --env=Walker2d-v2 --num_timesteps=1e6 --seed=30 --network=mlp --num_env=1 --save_path="/home/surabhi/Downloads/github/baselines/result/walker2d/30/ppo2" --log_path="/home/surabhi/Downloads/github/baselines/result/walker2d/30/" 30

But when i run the trained model it is showing return of ~9. python -m baselines.run --alg=ppo2 --env=Walker2d-v2 --num_timesteps=0 --seed=30 --network=mlp --num_env=1 --load_path="/home/surabhi/Downloads/github/baselines/result/walker2d/30/ppo2" --play --save_video_interval=1 save_video_length=1000

if args.play:         logger.log("Running trained model")         roll_rew = [0.0]         state = model.initial_state if hasattr(model, 'initial_state') else None         for roll in range(10): #number of rollout             obs = env.reset()             dones = np.zeros((1,))             episoderew = [0.0]             eplen=0             for ts in range(1000): #max rollout length (path len)                                 if state is not None:                     actions, , state, = model.step(obs,S=state, M=dones)                 else:                     actions, , , = model.step(obs)

                obs, rew, done, _ = env.step(actions)                 print(rew[0])                 episode_rew[-1] += rew[0]                 eplen+=1                 #env.render()                 done_any = done.any() if isinstance(done, np.ndarray) else done                 if done_any or eplen>=1000: #maxlen of rollout                     #for i in np.nonzero(done)[0]:                     #    print('episode_rew={}'.format(episode_rew))                         #episode_rew[i] = 0                     obs = env.reset()                     episode_rew.append(0.0)                     eplen=0             #print("#mean100ep_reward",episode_rew[-100:])             print(round(np.mean(episode_rew[-100:]),1),len(episode_rew))             roll_rew.append(round(np.mean(episode_rew[-100:]),1))         print("rollout avg ",sum(roll_rew)//10)

surbhi1944 commented 4 years ago

Please tell me how can we evaluate or simulate the trained model to know its performance?

Thanks

surbhi1944 commented 4 years ago

Waiting for the answer. Please reply. This happens with all the models trained with this repository. Training curves are somewhat closer to that of research paper but testing on seed on which training was done are too less.

tongzhoumu commented 4 years ago

Waiting for the answer. Please reply. This happens with all the models trained with this repository. Training curves are somewhat closer to that of research paper but testing on seed on which training was done are too less.

Hi, have you figured out the reason?

surbhi1944 commented 4 years ago

No

Sent from Outlook Mobilehttps://aka.ms/blhgte


From: tmu notifications@github.com Sent: Tuesday, May 26, 2020 11:42:41 AM To: openai/baselines baselines@noreply.github.com Cc: SURBHI GUPTA SG1944@bennett.edu.in; Comment comment@noreply.github.com Subject: Re: [openai/baselines] Trained model not working (#1054)

Waiting for the answer. Please reply. This happens with all the models trained with this repository. Training curves are somewhat closer to that of research paper but testing on seed on which training was done are too less.

Hi, have you figured out the reason?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/openai/baselines/issues/1054#issuecomment-633827037, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMH64BNDTZYPUFFBFGEUYIDRTNMVTANCNFSM4JZJ3SIQ.

shiqingw commented 3 years ago

I had some problems with the parser and it took me one hour to find out that the condition if load_path is not None in ppo2.py was not satisfied. Hope you did not make the same mistake as I did. :(