DQN loss calculation error when using Dict Action space

JaCoderX commented 4 years ago

followup issue to #276

I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env.

as I mentioned in the previous issue, the action space is of type gym.spaces.Dict.

Action Spec:
OrderedDict([('default_asset', BoundedTensorSpec(shape=(), dtype=tf.int64, name='action/default_asset', minimum=array(0), maximum=array(3)))])

following the DQN tutorial I reached the point for the agent to calculate the loss. but I get an error that the action space is missing the shape attribute. tracing the code back to the gym_wrapper.py it seems that dict space doesn't have shape attribute

...
elif isinstance(space, gym.spaces.Dict):
   return collections.OrderedDict([
       (key, nested_spec(s, key)) for key, s in space.spaces.items()])
...

this is the original error:

Traceback (most recent call last):
  File "home/Experimental RL/ResearchTF-Agents/Env/envTest.py", line 260, in <module>
    train_loss = agent.train(experience).loss
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 503, in _call
    self._initialize(args, kwds, add_initializers_to=initializer_map)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 408, in _initialize
    *args, **kwds))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 358, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/TF-Agents/tf_agents/agents/tf_agent.py", line 219, in train
    loss_info = self._train_fn(experience=experience, weights=weights)
  File "/homeTF-Agents/tf_agents/utils/common.py", line 131, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 354, in _train
    training=True)
  File "/home//TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 427, in _loss
    q_values = self._compute_q_values(time_steps, actions, training=training)
  File "/home/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 519, in _compute_q_values
    multi_dim_actions = self._action_spec.shape.rank > 0
AttributeError: 'collections.OrderedDict' object has no attribute 'shape'

how can I resolve this?

kbanoop commented 4 years ago

Thanks for raising this. I think it is a bug.

https://github.com/tensorflow/agents/blob/9057dd66c9dd88c4a4cd9b89d43df10e2740f678/tf_agents/agents/dqn/dqn_agent.py#L519

has to be changed to something like:

https://github.com/tensorflow/agents/blob/9057dd66c9dd88c4a4cd9b89d43df10e2740f678/tf_agents/agents/dqn/dqn_agent.py#L552

Would you like to submit a PR?

JaCoderX commented 4 years ago

@kbanoop, I applied your suggested fix and it seem to work fine. but when i run the code I crash right on the following line when trying to perform the cast operation, again probably because of the dict action space.

https://github.com/tensorflow/agents/blob/9057dd66c9dd88c4a4cd9b89d43df10e2740f678/tf_agents/agents/dqn/dqn_agent.py#L520-L523

this is the actions to be cast <class 'dict'>: {'default_asset': <tf.Tensor 'Squeeze_4:0' shape=(64,) dtype=int64>}

this is what i get now:

Traceback (most recent call last):
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 324, in _AssertCompatible
    fn(values)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 276, in _check_not_tensor
    _ = [_check_failed(v) for v in nest.flatten(values)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 277, in <listcomp>
    if isinstance(v, ops.Tensor)]
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 248, in _check_failed
    raise ValueError(v)
ValueError: Tensor("Squeeze_4:0", shape=(64,), dtype=int64)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 503, in _call
    self._initialize(args, kwds, add_initializers_to=initializer_map)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 408, in _initialize
    *args, **kwds))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 358, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/jack/TF-Agents/tf_agents/agents/tf_agent.py", line 219, in train
    loss_info = self._train_fn(experience=experience, weights=weights)
  File "/home/jack/TF-Agents/tf_agents/utils/common.py", line 131, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/jack/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 354, in _train
    training=True)
  File "/home/jack/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 427, in _loss
    q_values = self._compute_q_values(time_steps, actions, training=training)
  File "/home/jack/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 522, in _compute_q_values
    tf.cast(actions, dtype=tf.int32),
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 702, in cast
    x = ops.convert_to_tensor(x, name="x")
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1296, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 286, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
    allow_broadcast=True)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 265, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 449, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 328, in _AssertCompatible
    raise TypeError("List of Tensors when single Tensor expected")
TypeError: List of Tensors when single Tensor expected

kbanoop commented 4 years ago

Yes that sounds like the same issue. Can you try adding actions = tf.nest.flatten(actions)[0], perhaps at the beginning of the _compute_q_values function?

JaCoderX commented 4 years ago

@kbanoop, I have tested the solution and it works good. I made a PR for this issue and #276 as they are both addressing the problem of unsupported Dict Action Space.

tensorflow / agents

DQN loss calculation error when using Dict Action space #297