Valid Gym environments to use! #132

I've tried spinningup in running many experiments using the different algorithms in different Gym environments. It works well in most environments, like Atari, Box2D, Classic control and MuJoCo, however it didn't work with the new gym environments of "Robotics".

For example when I run the following command on terminal: python -m ppo --env FetchReach-v1 --exp_name FetchReach

It shows:

ExperimentGrid [FetchReachExp] runs over parameters:

 env_name                                 [env] 


 Variants, counting seeds:               1
 Variants, not counting seeds:           1


Preparing to run the following experiments...



Launch delayed to give you a few seconds to review your experiments.

To customize or disable this behavior, change WAIT_BEFORE_LAUNCH in

Launch delayed to give you a few seconds to review your experiments.

To customize or disable this behavior, change WAIT_BEFORE_LAUNCH in

Running experiment:


with kwargs:

    "env_name": "FetchReach-v1",
    "seed": 0

Logging data to /home/sketcher/MachineLearning/DRL/OpenAI/spinningup/data/FetchReachExp/FetchReachExp_s0/progress.txt
Saving config:

    "ac_kwargs":    {},
    "actor_critic": "mlp_actor_critic",
    "clip_ratio":   0.2,
    "env_fn":   "<function call_experiment.<locals>.thunk_plus.<locals>.<lambda> at 0x7f245efad488>",
    "epochs":   100,
    "exp_name": "FetchReachExp",
    "gamma":    0.99,
    "lam":  0.97,
    "logger":   {
        "<spinup.utils.logx.EpochLogger object at 0x7f245efbd9b0>": {
            "epoch_dict":   {},
            "exp_name": "FetchReachExp",
            "first_row":    true,
            "log_current_row":  {},
            "log_headers":  [],
            "output_dir":   "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/data/FetchReachExp/FetchReachExp_s0",
            "output_file":  {
                "<_io.TextIOWrapper name='/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/data/FetchReachExp/FetchReachExp_s0/progress.txt' mode='w' encoding='UTF-8'>":   {
                    "mode": "w"
    "logger_kwargs":    {
        "exp_name": "FetchReachExp",
        "output_dir":   "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/data/FetchReachExp/FetchReachExp_s0"
    "max_ep_len":   1000,
    "pi_lr":    0.0003,
    "save_freq":    10,
    "seed": 0,
    "steps_per_epoch":  4000,
    "target_kl":    0.01,
    "train_pi_iters":   80,
    "train_v_iters":    80,
    "vf_lr":    0.001
Traceback (most recent call last):
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/utils/", line 11, in <module>
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/utils/", line 162, in thunk_plus
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/algos/ppo/", line 183, in ppo
    x_ph, a_ph = core.placeholders_from_spaces(env.observation_space, env.action_space)
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/algos/ppo/", line 27, in placeholders_from_spaces
    return [placeholder_from_space(space) for space in args]
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/algos/ppo/", line 27, in <listcomp>
    return [placeholder_from_space(space) for space in args]
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/algos/ppo/", line 24, in placeholder_from_space
    raise NotImplementedError


There appears to have been an error in your experiment.

Check the traceback above to see what actually went wrong. The 
traceback below, included for completeness (but probably not useful
for diagnosing the error), shows the stack leading up to the 
experiment launch.


Traceback (most recent call last):
  File "/home/sketcher/anaconda3/envs/OpAI-env/lib/python3.7/", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/sketcher/anaconda3/envs/OpAI-env/lib/python3.7/", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/", line 230, in <module>
    parse_and_execute_grid_search(cmd, args)
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/", line 162, in parse_and_execute_grid_search, **run_kwargs)
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/utils/", line 546, in run
    data_dir=data_dir, datestamp=datestamp, **var)
  File "/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/utils/", line 171, in call_experiment
    subprocess.check_call(cmd, env=os.environ)
  File "/home/sketcher/anaconda3/envs/OpAI-env/lib/python3.7/", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/sketcher/anaconda3/envs/OpAI-env/bin/python', '/home/sketcher/MachineLearning/DRL/OpenAI/spinningup/spinup/utils/', 'eJydUk1v1DAQdTbbbShqQRSB6JXL9sCmHLgtldBCJZSySAsHLshyY+dD69ghsUt7qIQEbbeSxaUDFy7wTxnvVv24IWLFmXmT8czzmy_dHy4k88c9TKW2vC7TqRSDaza4NZqVUtLMqtSUWsE59Avi7tCKTQVtp2IRQhjRVbpnS2lKRc1hLRBz0Uhz8d47xzCBfkLmK0zWRzs8mHWmvWaLdziR5Aw9HjwiZ+SUnAZZh4e8+y1CbGlIfOQ5McEs/BoEZNbNEPmOFifvCPTHLhJqnypWCUhIcet6kZVZ8JvMyJ8Afxwfg+v6rrGVTTeOC12JGAmYtBBN/IalRanErmCNKlUev5zsxm9roV68jtu6VB6z9dzEjzWlbOPGKjq3BvUhMh1KVu1xtg3JrxGB4rYL88MKTmDTIHP3LGV4i+KgFk1ZCWUGQ6kRarcHprBqSmtp2yvs8izX89wyBQaKdRdVdUkz3Uz9scVdt3KVC8nP0RrpRIFf94JeGIWYPP3MmrwFt6xsRdPagluap8ApNlisn4DvLQmP0J_AEfTdci71HraATrHhHiz4DhYsfXGjNQaLjXNoXcRFxqw0LYxdl5epwSTXqzS3fm5u5l7eFQ4UjldrG0H3mbSihY_Qx8p4X6s7XouJQCWe7D9FsVohOErqVqXOc9HQCza+zxVtTW0N5WUD7sN_acmZYfFVyVcH9U2PtluoKiq2GK3r7WEUrE0Cd_+CJJO5xjGotX_BhX4_B+GiT5bJRfrjfxgAsGYy+AvUx03_']' returned non-zero exit status 1.

Does SpinningUp support this enviroments (Robotics) or it is a problem from my side?

FetchReach environment has Dict observation space (because it packages not only arm position, but also the target location into the observation), and spinning up does not implement support for Dict observation spaces yet. One thing you can do is add a FlattenDictWrapper from gym (for example usage see, for instance, Note, however, that by default FetchReach provides sparse rewards (1 if the goal is reached, 0 otherwise), which makes it rather hard for ppo. To make learning easier you can modify the spinning up code a bit to initialize environment with reward_type='dense' kwarg, like this: env = gym.make('FetchReach-v1', reward_type='dense'). Hope this helps!

Thanks for your response @pzhokhov ,

I went to the following lines of spinningup/spinup/algos/ code file:

    env = env_fn()
    obs_dim = env.observation_space.shape
    act_dim = env.action_space.shape

and I added:

    env = env_fn()
    env = gym.wrappers.FlattenDictWrapper(env, ['observation', 'desired_goal'])
    obs_dim = env.observation_space.shape
    act_dim = env.action_space.shape


    ppo(lambda : gym.make(args.env), actor_critic=core.mlp_actor_critic,
        ac_kwargs=dict(hidden_sizes=[args.hid]*args.l), gamma=args.gamma, 
        seed=args.seed, steps_per_epoch=args.steps, epochs=args.epochs,

then I added _rewardtype='dense' as follows:

    ppo(lambda : gym.make(args.env, reward_type='dense'), actor_critic=core.mlp_actor_critic,
        ac_kwargs=dict(hidden_sizes=[args.hid]*args.l), gamma=args.gamma, 
        seed=args.seed, steps_per_epoch=args.steps, epochs=args.epochs,

It worked really fine, thank you very much.

@pzhokhov I ran the algorithms on 'FetchReach-v1' environment, and only the On-Policy algorithms [VPG, PPO, TRPO] work.

@RamiSketcher I defer to people with proper theory background (@jachiam) to answer whether off-policy algorithms should work with FetchReach with dense rewards - I think that should be possible; but may require some hyperparameter tuning (off-policy methods are more sensitive to the hyperparameter settings).

Hi @RamiSketcher! Not sure what you mean by "only the On-Policy algorithms work"---do you mean that only those algorithms reach a level of performance you think is good? Or that the other ones experience some kind of breaking bug?

Hi @jachiam ! sorry to be late.

It was actually my mistake, I didn't notice that there is a test_env beside env in On-Policy codes, so I had to do the same thing I did for it, I replaced:

123    env, test_env = env_fn(), env_fn()
124    obs_dim = env.observation_space.shape[0]
125    act_dim = env.action_space.shape[0]


123    env, test_env = env_fn(), env_fn()
124    env, test_env = gym.wrappers.FlattenDictWrapper(env, ['observation', 'desired_goal']), gym.wrappers.FlattenDictWrapper(test_env, ['observation', 'desired_goal'])
125    obs_dim = env.observation_space.shape[0]
126    act_dim = env.action_space.shape[0]

and it worked!

However, and now you mentioned the performance, I trained the 'FetchReach-v1' using PPO and DDPG with the following commands: python -m ppo --exp_name PPO_FetchReach_Long --env FetchReach-v1 --clip_ratio 0.1 0.2 --hid[h] [32,32] [64,32] --act tf.nn.tanh --seed 0 10 20 and: python -m ddpg --exp_name DDPG_FetchReach_Long --env FetchReach-v1 --hid[h] [32,32] [64,32] --act tf.nn.tanh --seed 0 10 20 and my results was:


but I didn't get an improvement in any of these results! (compared to the other environments [Atari, MuJoCo, ..etc]).

May be pure PPO or DDPG doesn't work well with this type of environment, so may be they need some additional auxiliary stuff to be added.

If you haven't done so already, I think you should check out this paper, the original tech report put out by the OpenAI robotics team about these environments. It looks like you should be able to get DDPG+dense rewards to succeed on FetchReach-v1, but you should change hyperparameters to get as close as possible to what they had (eg hid [256,256,256], relu activations, possibly the various other details as well). What's more: you may want to try running for longer than 400k transitions.

Since this is not a code issue but is a matter of scientific exploration, I'm going to mark this closed. But feel free to continue asking questions here and I'll try to answer them when I can. (Or feel free to email me, jachiam[at]