tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.79k stars 721 forks source link

Gym wrapper and preprocessing_combiner: KeyError: 0 #432

Open mjlbach opened 4 years ago

mjlbach commented 4 years ago

I'm wrapping a pybullet environment in the gym wrapper with multiple different observations returned from the environment

(pdb) pp tf_env.observation_spec()
OrderedDict([('bounding_box',
              BoundedTensorSpec(shape=(6, 6), dtype=tf.float32, name='observation/bounding_box', minimum=array(-4.2949673e+09, dtype=float32), maximum=array(4.2949673e+09, dtype=float32))),
             ('color',
              BoundedTensorSpec(shape=(6, 3), dtype=tf.float32, name='observation/color', minimum=array(0., dtype=float32), maximum=array(255., dtype=float32))),
             ('mass',
              BoundedTensorSpec(shape=(6, 3), dtype=tf.float32, name='observation/mass', minimum=array(-4.2949673e+09, dtype=float32), maximum=array(4.2949673e+09, dtype=float32))),
             ('intactness',
              BoundedTensorSpec(shape=(6, 3), dtype=tf.float32, name='observation/intactness', minimum=array(-4.2949673e+09, dtype=float32), maximum=array(4.2949673e+09, dtype=float32))),
             ('volume',
              BoundedTensorSpec(shape=(6, 1), dtype=tf.float32, name='observation/volume', minimum=array(-100., dtype=float32), maximum=array(100., dtype=float32)))])

I'm trying to filter and feed certain observations into an actor network (some are withheld). I assumed I could use a preprocessing_combiner to do this.

      actor_net = actor_distribution_network.ActorDistributionNetwork(
          tf_env.observation_spec(),
          tf_env.action_spec(),
          preprocessing_combiner=tf.keras.layers.Concatenate(axis=-1),
          fc_layer_params=actor_fc_layers,
          activation_fn=tf.keras.activations.tanh)

However, I get a key error:

  File "ipsum/train/train_SE_tf2.py", line 450, in <module>
    app.run(main)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "ipsum/train/train_SE_tf2.py", line 445, in main
    num_eval_episodes=FLAGS.num_eval_episodes)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/gin/config.py", line 1032, in wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/gin/config.py", line 1009, in wrapper
    return fn(*new_args, **new_kwargs)
  File "ipsum/train/train_SE_tf2.py", line 219, in train_eval
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/tf_agents/networks/network.py", line 205, in __call__
    outputs, new_state = super(Network, self).__call__(inputs, *args, **kwargs)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/tf_agents/networks/encoding_network.py", line 313, in call
    self._preprocessing_nest, observation, check_types=False),
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/tensorflow/python/util/nest.py", line 934, in flatten_up_to
    return list(v for _, v in _yield_flat_up_to(shallow_tree, input_tree, is_seq))
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/tensorflow/python/util/nest.py", line 934, in <genexpr>
    return list(v for _, v in _yield_flat_up_to(shallow_tree, input_tree, is_seq))
  File "/home/michael/.cache/pypoetry/virtualenvs/ipsum-ZyrtZG2z-py3.7/lib64/python3.7/site-packages/tensorflow/python/util/nest.py", line 725, in _yield_flat_up_to
    input_subtree = input_tree[shallow_key]
KeyError: 0
  In call to configurable 'train_eval' (<function train_eval at 0x7f98c6344b90>)

This is happening in the call to preprocessing_combiner in the encoding network. I'm assuming this is because the dictionaries are not being appropriately denested and at some point a dictionary is assumed to be an array and indexed with 0.

As a simple example, the following fails

      network = encoding_network.EncodingNetwork(tf_env.observation_spec(), 
                                                 preprocessing_layers=[tf.keras.layers.Layer()]*5,
                                                 preprocessing_combiner=tf.keras.layers.Concatenate(axis=-1))
      out = network(tf_env.reset().observation)
      network = encoding_network.EncodingNetwork(tf_env.observation_spec(), 
                                                 preprocessing_combiner=tf.keras.layers.Concatenate(axis=-1))
      out = network(tf_env.reset().observation)

It works if I flatten the dictionary before setting the observation spec and while feeding in the observation

      network = encoding_network.EncodingNetwork(tf.nest.flatten(tf_env.observation_spec()), 
                                                 preprocessing_combiner=tf.keras.layers.Concatenate(axis=-1))
      out = network(tf.next.flatten(tf_env.reset().observation))

however, the previous solution requires modifying every call to actor/value networks to ensure that the flattened observations are passed. Is there no method for feeding in a dictionary of observations in?

ebrevdo commented 4 years ago

For now the simplest solution is probably to pass preprocessing_layers=tf.nest.map_structure(lambda _: tf.keras.layers.Layer(), observation_spec)

We'll soon make this a bit easier with the new Sequential and NestMap networks being supported in addition to ActorDistributionNetwork.

ebrevdo commented 4 years ago

it's sad that the error message is so ugly. basically the structure of your preprocessing_layers must match that of observation_spec; in this case, it must be a dict.

ebrevdo commented 4 years ago

(but when you use tf_agents.networks.Sequential, the first layer can just be tf_agents.networks.NestFlatten() layer.)