openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
10.15k stars 2.23k forks source link

ValueError: Input 0 of layer dense_1 is incompatible (Spinning Up implementation do not work for discrete observation spaces) #122

Closed aaronsnoswell closed 4 years ago

aaronsnoswell commented 5 years ago

Summary: I've noticed that the Spinning Up algorithm implementations don't seem to support discrete observation spaces defined with gym.spaces.Discrete.

Steps to reproduce:

  1. python -m spinup.run ppo --env FrozenLake-v0

Alternatively, try any other algorithm, and/or any other gym environment that uses gym.spaces.Discrete for it's observation space (e.g. any of the Algorithms or Toy Text family).

Expected result: Should run and train a PPO policy for the FrozenLake task.

Observed result:

(python36) E:\Development>python -m spinup.run ppo --env FrozenLake-v0
================================================================================
ExperimentGrid [cmd_ppo] runs over parameters:

 env_name                                 [env]

        FrozenLake-v0

 Variants, counting seeds:               1
 Variants, not counting seeds:           1

================================================================================

Preparing to run the following experiments...

cmd_ppo

================================================================================

Launch delayed to give you a few seconds to review your experiments.

To customize or disable this behavior, change WAIT_BEFORE_LAUNCH in
spinup/user_config.py.

================================================================================
Running experiment:

cmd_ppo

with kwargs:

{
    "env_name": "FrozenLake-v0",
    "seed":     0
}

Warning: Log dir e:\development\spinningup\data\cmd_ppo\cmd_ppo_s0 already exists! Storing info there anyway.
Logging data to e:\development\spinningup\data\cmd_ppo\cmd_ppo_s0\progress.txt
Saving config:

{
    "ac_kwargs":        {},
    "actor_critic":     "mlp_actor_critic",
    "clip_ratio":       0.2,
    "env_fn":   "<function call_experiment.<locals>.thunk_plus.<locals>.<lambda> at 0x000002C008D69D90>",
    "epochs":   50,
    "exp_name": "cmd_ppo",
    "gamma":    0.99,
    "lam":      0.97,
    "logger":   {
        "<spinup.utils.logx.EpochLogger object at 0x000002C008D79A58>": {
            "epoch_dict":       {},
            "exp_name": "cmd_ppo",
            "first_row":        true,
            "log_current_row":  {},
            "log_headers":      [],
            "output_dir":       "e:\\development\\spinningup\\data\\cmd_ppo\\cmd_ppo_s0",
            "output_file":      {
                "<_io.TextIOWrapper name='e:\\\\development\\\\spinningup\\\\data\\\\cmd_ppo\\\\cmd_ppo_s0\\\\progress.txt' mode='w' encoding='cp1252'>":       {
                    "mode":     "w"
                }
            }
        }
    },
    "logger_kwargs":    {
        "exp_name":     "cmd_ppo",
        "output_dir":   "e:\\development\\spinningup\\data\\cmd_ppo\\cmd_ppo_s0"
    },
    "max_ep_len":       1000,
    "pi_lr":    0.0003,
    "save_freq":        10,
    "seed":     0,
    "steps_per_epoch":  4000,
    "target_kl":        0.01,
    "train_pi_iters":   80,
    "train_v_iters":    80,
    "vf_lr":    0.001
}
Traceback (most recent call last):
  File "e:\development\spinningup\spinup\utils\run_entrypoint.py", line 11, in <module>
    thunk()
  File "e:\development\spinningup\spinup\utils\run_utils.py", line 162, in thunk_plus
    thunk(**kwargs)
  File "e:\development\spinningup\spinup\algos\ppo\ppo.py", line 187, in ppo
    pi, logp, logp_pi, v = actor_critic(x_ph, a_ph, **ac_kwargs)
  File "e:\development\spinningup\spinup\algos\ppo\core.py", line 101, in mlp_actor_critic
    pi, logp, logp_pi = policy(x, a, hidden_sizes, activation, output_activation, action_space)
  File "e:\development\spinningup\spinup\algos\ppo\core.py", line 69, in mlp_categorical_policy
    logits = mlp(x, list(hidden_sizes)+[act_dim], activation, None)
  File "e:\development\spinningup\spinup\algos\ppo\core.py", line 31, in mlp
    x = tf.layers.dense(x, units=h, activation=activation)
  File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\python36\lib\site-packages\tensorflow\python\layers\core.py", line 190, in dense
    return layer.apply(inputs)
  File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 774, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\python36\lib\site-packages\tensorflow\python\layers\base.py", line 329, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 688, in __call__
    self._assert_input_compatibility(inputs)
  File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1409, in _assert_input_compatibility
    str(x.get_shape().as_list()))
ValueError: Input 0 of layer dense_1 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: [None]

================================================================================

There appears to have been an error in your experiment.

Check the traceback above to see what actually went wrong. The
traceback below, included for completeness (but probably not useful
for diagnosing the error), shows the stack leading up to the
experiment launch.

================================================================================

This error seems to be a problem with a tf.placeholder when constructing the policy network.

Notes: Interestingly, it seems that discrete action spaces are fine (e.g. I can train policies for the MountainCar task).

My understanding is that policy gradient methods in general should support discrete observation spaces.

gcsfred2 commented 5 years ago

I'm using RLlib grid search with TF+Keras and I get a similar problem with the same root cause. I can't use a Tuple with Discrete in my observable space.

jachiam commented 4 years ago

Hello! Sorry for the long delay.

Indeed: there is not support for discrete observation spaces, and there is no plan to implement it. A simple thing you could do is wrap your environment with something that converts the integer observation into a one-hot vector. Then things should work fine.