Open JaCoderX opened 4 years ago
I'm still not sure on how to add support for Dict action space.
The error occur on trying to get the action from greedy_action
(greedy_policy is source)
action = tf.compat.v1.where(cond, greedy_action.action, random_action.action)
That line needs to be rewritten as:
action = tf.nest.map_structure(lambda g, r: tf.compat.v1.where(cond, g, r), greedy_action.action, random_action.action)
Report back and let us know if this works. We can patch it on our end.
@ebrevdo, I tested it on my end and it works well. thank you :)
I'm trying to convert a custom gym project (called BTgym) to work as a tf-agent env. the original observation space and the action space are both
gym.spaces.Dict
. but for the moment I have simplified the observation space so I can fit the env to run using the same code of DQN tutorial example (as a proof of concept). so the modified spaces are as follows:Error occur under Training the agent section, when performing
collect_step()
:it seems that epsilon greedy policy have some problem with the Dict action space when trying to generate action
action_step = policy.action(time_step)
as both DQN and random agents seems to work fine for producing actions, Dict space in epsilon greedy policy seem not to be supported.any idea on how to resolve this?