lbertge / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
0 stars 0 forks source link

About ppo2 #2

Open Kailiangdong opened 4 years ago

Kailiangdong commented 4 years ago

Hello, thank you for your code sharing. https://github.com/openai/baselines/pull/1027.At here, You said you don't get much success for intergration ppo2 into gail.Can you tell me what kind of "not success" do you meet? I find that the ppo1 is closer to original ppo paper. And ppo2 seems the update version of ppo1 but corresponding paper. I also find that following answers said ppo2 actually clip the value function. https://github.com/openai/baselines/issues/485 https://github.com/hill-a/stable-baselines/issues/359. Can you share your current intergration of ppo2-gail? Thank you.

lbertge commented 4 years ago

Hello @Kailiangdong,

I apologize for the delayed response - I have been busy otherwise and have not had a chance to reply. I really appreciate your patience.

It's true I have not been able to integrate PPO2 with GAIL. My definition of "not success" is just that the performance does not match that of TRPO with GAIL.

For PPO2-GAIL, I have some graphs. Here is one result, on Hopper-v2 environment: image For reference, this is TRPO-GAIL's performance, on Hopper-v2: image I realize the timesteps are not the same, but this roughly gives you an idea of the performance - by ~200k timesteps TRPO-GAIL is able to solve the environment (3500 score), whereas PPO2-GAIL roughly achieves more than half the score (2000 score).

I think you are correct that ppo2 does clip the value function - I mention that, briefly, in the original PR openai#1027, but I haven't gotten an official confirmation yet.

On your last point, PPO2-GAIL already exists, here. This is a mostly a direct port of the PPO2 codebase, and similarly uses a runner to generate the rollouts. I do not believe the code is runnable in its current state - the reason is that I configuring GAIL API to use the code in pposgd_mpi.py, which contains the implementation of PPO1.

Let me know if you would like some help setting up PPO2-GAIL to be runnable. I am happy to work on it, given I am familiar with how I wrote my code.

Kailiangdong commented 4 years ago

Thank you for your reply. I also implement the ppo2 with gail, which is very similar yours. But I am still debugging. I will find the reason why the reward of ppo2-gail not so high. By the way , may I ask whether your ppo2-gail support mpi? It seems support,right? I have't test.

Kailiangdong commented 4 years ago

image I think you did't use mpi two for ppo2 right? Becasue here in parameter of ppo2 ,there are't parameter for rank.

lbertge commented 4 years ago

I don't believe it does. I named the file ppo_mpi.py because I eventually wanted to support it (since PPO2 does), but the issues regarding its performance with GAIL made me focus on using just one rank.

Kailiangdong commented 4 years ago

I thought maybe you should adjust several parameters. Because the current parameter is best for trpo-gail. Did you search parameter for it?

lbertge commented 4 years ago

If you're asking about parameter tuning, I didn't do much of it. This is because when debugging the performance, I found that changing parameters did not really help. In general I tried to keep the same parameters for PPO1.