Closed JustinACoder closed 1 year ago
Although I still wonder why we don't change v1 to v2, I found my problem. The wrong shape in the action was due to the observation_and_action_constraint_splitter function that I didn't implement well.
I tried doing the observation_and_action_constraint_splitter directly in tensorflow instead of defining the legal moves in my env and then simply splitting in the observation_and_action_constraint_splitter function. It was working relatively well. However, I was returning the action mask with shape () but it had to be of shape (1,) so I simply had to change the last line of my observation_and_action_constraint_splitter function from
return observation, action_mask
To
return observation, tf.expand_dims(action_mask, axis=0)
It seems like this issue is already known as it is described in a TODO in the _action method of EpsilonGreedyPolicy (tf_agents/policies/epsilon_greedy_policy.py) :
I'm bringing this up because I've had some issues and noticed that changing tf.compat.v1.where to tf.compat.v2.where solves them.
In my case, the problem was that greedy_action.action had shape (1,) but random_action.action had shape ().
The v1 can't handle this while the v2 can.
Did I do something wrong on my end? Can't we simply change v1 to v2?