openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.64k stars 4.86k forks source link

How to improve the success rate #494

Open huangjiancong1 opened 6 years ago

huangjiancong1 commented 6 years ago

How to improve the success rate, my goal is to use BAXTER robot to push the object to the target point in MUJOCO, my GYM environment has been completed, but his training success rate has been very low(0.0~0.1). And trained 50 times success rate and trained 200 times second success rate not far away.

Saving periodic policy to /tmp/openai-2018-08-06-21-12-50-218152/policy_55.pkl ...
------------------------------------
| epoch              | 56          |
| stats_g/mean       | 0.4804729   |
| stats_g/std        | 0.42982802  |
| stats_o/mean       | 0.109720156 |
| stats_o/std        | 0.21830468  |
| test/episode       | 1140.0      |
| test/mean_Q        | -31.927176  |
| test/success_rate  | 0.075       |
| train/episode      | 5700.0      |
| train/success_rate | 0.01        |
------------------------------------
-----------------------------------
| epoch              | 57         |
| stats_g/mean       | 0.48053482 |
| stats_g/std        | 0.42993855 |
| stats_o/mean       | 0.10971988 |
| stats_o/std        | 0.21833657 |
| test/episode       | 1160.0     |
| test/mean_Q        | -37.463974 |
| test/success_rate  | 0.0        |
| train/episode      | 5800.0     |
| train/success_rate | 0.02       |
-----------------------------------
------------------------------------
| epoch              | 58          |
| stats_g/mean       | 0.48076412  |
| stats_g/std        | 0.43026587  |
| stats_o/mean       | 0.109837644 |
| stats_o/std        | 0.21850404  |
| test/episode       | 1180.0      |
| test/mean_Q        | -31.406279  |
| test/success_rate  | 0.075       |
| train/episode      | 5900.0      |
| train/success_rate | 0.015       |
------------------------------------
huangjiancong1 commented 6 years ago

Use DDPG+HER algorithm to training config setting:

Logging to /tmp/openai-2018-08-06-21-12-50-218152
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_hidden: 256
_layers: 3
_max_u: 1.0
_network_class: baselines.her.actor_critic:ActorCritic
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_relative_goals: False
_scope: ddpg
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'scope': 'ddpg', 'relative_goals': False}
env_name: FetchSlide-v1
gamma: 0.98
make_env: <function prepare_params.<locals>.make_env at 0x7fd9807e9bf8>
n_batches: 40
n_cycles: 50
n_test_rollouts: 10
noise_eps: 0.2
random_eps: 0.3
replay_k: 4
replay_strategy: future
rollout_batch_size: 2
test_with_polyak: False
Creating a DDPG agent with action space 4 x 1.0...
Training...
-----------------------------------
| epoch              | 0          |
| stats_g/mean       | 0.5070929  |
| stats_g/std        | 0.46622786 |
| stats_o/mean       | 0.11331562 |
| stats_o/std        | 0.22906026 |
| test/episode       | 20.0       |
| test/mean_Q        | -2.963306  |
| test/success_rate  | 0.0        |
| train/episode      | 100.0      |
| train/success_rate | 0.0        |
valeriechen commented 5 years ago

Unrelated question, but which mujoco and gym versions are you using?