tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.8k stars 722 forks source link

DQN loss calculation error when using Dict Action space #297

Open JaCoderX opened 4 years ago

JaCoderX commented 4 years ago

followup issue to #276

I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env.

as I mentioned in the previous issue, the action space is of type gym.spaces.Dict.

Action Spec:
OrderedDict([('default_asset', BoundedTensorSpec(shape=(), dtype=tf.int64, name='action/default_asset', minimum=array(0), maximum=array(3)))]) 

following the DQN tutorial I reached the point for the agent to calculate the loss. but I get an error that the action space is missing the shape attribute. tracing the code back to the gym_wrapper.py it seems that dict space doesn't have shape attribute

...
elif isinstance(space, gym.spaces.Dict):
   return collections.OrderedDict([
       (key, nested_spec(s, key)) for key, s in space.spaces.items()])
...

this is the original error:

Traceback (most recent call last):
  File "home/Experimental RL/ResearchTF-Agents/Env/envTest.py", line 260, in <module>
    train_loss = agent.train(experience).loss
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 503, in _call
    self._initialize(args, kwds, add_initializers_to=initializer_map)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 408, in _initialize
    *args, **kwds))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 358, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/TF-Agents/tf_agents/agents/tf_agent.py", line 219, in train
    loss_info = self._train_fn(experience=experience, weights=weights)
  File "/homeTF-Agents/tf_agents/utils/common.py", line 131, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 354, in _train
    training=True)
  File "/home//TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 427, in _loss
    q_values = self._compute_q_values(time_steps, actions, training=training)
  File "/home/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 519, in _compute_q_values
    multi_dim_actions = self._action_spec.shape.rank > 0
AttributeError: 'collections.OrderedDict' object has no attribute 'shape'

how can I resolve this?

kbanoop commented 4 years ago

Thanks for raising this. I think it is a bug.

https://github.com/tensorflow/agents/blob/9057dd66c9dd88c4a4cd9b89d43df10e2740f678/tf_agents/agents/dqn/dqn_agent.py#L519

has to be changed to something like:

https://github.com/tensorflow/agents/blob/9057dd66c9dd88c4a4cd9b89d43df10e2740f678/tf_agents/agents/dqn/dqn_agent.py#L552

Would you like to submit a PR?

JaCoderX commented 4 years ago

@kbanoop, I applied your suggested fix and it seem to work fine. but when i run the code I crash right on the following line when trying to perform the cast operation, again probably because of the dict action space.

https://github.com/tensorflow/agents/blob/9057dd66c9dd88c4a4cd9b89d43df10e2740f678/tf_agents/agents/dqn/dqn_agent.py#L520-L523

this is the actions to be cast <class 'dict'>: {'default_asset': <tf.Tensor 'Squeeze_4:0' shape=(64,) dtype=int64>}

this is what i get now:

Traceback (most recent call last):
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 324, in _AssertCompatible
    fn(values)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 276, in _check_not_tensor
    _ = [_check_failed(v) for v in nest.flatten(values)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 277, in <listcomp>
    if isinstance(v, ops.Tensor)]
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 248, in _check_failed
    raise ValueError(v)
ValueError: Tensor("Squeeze_4:0", shape=(64,), dtype=int64)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 503, in _call
    self._initialize(args, kwds, add_initializers_to=initializer_map)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 408, in _initialize
    *args, **kwds))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 358, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/jack/TF-Agents/tf_agents/agents/tf_agent.py", line 219, in train
    loss_info = self._train_fn(experience=experience, weights=weights)
  File "/home/jack/TF-Agents/tf_agents/utils/common.py", line 131, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/jack/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 354, in _train
    training=True)
  File "/home/jack/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 427, in _loss
    q_values = self._compute_q_values(time_steps, actions, training=training)
  File "/home/jack/TF-Agents/tf_agents/agents/dqn/dqn_agent.py", line 522, in _compute_q_values
    tf.cast(actions, dtype=tf.int32),
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 702, in cast
    x = ops.convert_to_tensor(x, name="x")
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1184, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1242, in convert_to_tensor_v2
    as_ref=False)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1296, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 286, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
    allow_broadcast=True)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 265, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 449, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/jack/anaconda3/envs/deep/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 328, in _AssertCompatible
    raise TypeError("List of Tensors when single Tensor expected")
TypeError: List of Tensors when single Tensor expected
kbanoop commented 4 years ago

Yes that sounds like the same issue. Can you try adding actions = tf.nest.flatten(actions)[0], perhaps at the beginning of the _compute_q_values function?

JaCoderX commented 4 years ago

@kbanoop, I have tested the solution and it works good. I made a PR for this issue and #276 as they are both addressing the problem of unsupported Dict Action Space.