The performance of PPO varries greatly

openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.

https://spinningup.openai.com/

MIT License

9.96k stars 2.19k forks source link

The performance of PPO varries greatly #250

Open zhihaocheng opened 4 years ago

zhihaocheng commented 4 years ago

I run PPO torch to train HalfCheetah-v2 with the following command line.

"python -m spinup.run ppo_pytorch --exp_name ppo_halfcheetah --env HalfCheetah-v2 --hid[h] [64,32] --act torch.nn.ReLU --seed 0 10 20 30 40 50 60 70 80 90 --epochs 700 --cpu 4"

I found that for some seeds, the final performance is above 4000, but for some seeds it is only abover 2000.

Is this natural? In my opinion, the varriance for different is to large. Could you please help me?

vermouth1992 commented 4 years ago

Take a look at this. https://openreview.net/forum?id=r1etN1rtPB