Open gbuonamico opened 4 years ago
Should your observation spec in your environment be TensorShape([960, 18]) instead of TensorShape([1, 960, 18])?
Hello, thank you for replying. No because using LSTM in actor and value Network needs an additional dimension. The train works fine. This is the portion of code I use for training
def train_step(): trajectories = replay_buffer.gather_all() return tf_agent.train(experience=trajectories)
collect_time = 0 train_time = 0 time_step = None timed_at_step = global_step.numpy() while environment_steps_metric.result() < num_environment_steps: current_metrics = [] start_time = time.time()
collect_driver.run()
collect_time += time.time() - start_time
start_time = time.time()
total_loss, _ = train_step()
replay_buffer.clear()
train_time += time.time() - start_time
Hello, any suggestion will be appreciated....
Sorry about the delay. Let me take a closer look at this today afternoon.
Not a problem. I think the problem come from the use of the wrapper "train_step=common.function(train_step)" in the training phase. keep this in mind tomorrow and let me know please...
The ValueError you're seeing is saying that your input time_step into policy.action
is not aligned with the spec it is expecting. Could you try not including the additional dimension in your spec for observations, despite it being LSTM? I think the agent handles that by checking the network later on.
I also tried running your code on Cartpole and it finished successfully, so the issue seems like with the environment.
Adding @oars who's more knowledgable than me this front to confirm.
I agree on the fact that time_step and policy_state are not aligned. time_step is batched, while policy_state (initial state) is not.
If I not include the additional dimension in the observation space I got the following error "ValueError: Shapes (960, 18) and (1, 960, 18) are incompatible" while trying to load the checkpoint (which is what I expected).
If you look at the error, both tensors has the right dimension for the observation (1,960,18) but they differ on the fact that time_step is batched (has dimension [1] in the three fields before observation shape) and policy_state is not (has dimension [] in the same fields)
ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs. num_outer_dims: 1. Saw tensor_shapes: [TensorShape([1]), TensorShape([1]), TensorShape([1]), TensorShape([1, 960, 18])] And spec_shapes: [TensorShape([]), TensorShape([]), TensorShape([]), TensorShape([1, 960, 18])]
My question is how can I add this batch dimension to policy_state observation?
Additional note : in the train phase (which is working fine), I got the same error while loading the checkpoint if I do not use common.function for the train_step and agent.train....
Could you remove the extra dimension in you spec (not in your observation), such that the spec_shapes is [960, 18] and your observation is still [1, 960, 18]?
Sorry but I don't understand what you mean. In my environment the definition of action_spec and observation_spec are the following
**self._action_spec = array_spec.BoundedArraySpec( shape=(), dtype=np.int32, minimum=0, maximum=2, name='action')
ns=(1,self.shape[0],self.shape[1]) self._observation_spec = array_spec.BoundedArraySpec( shape=ns, dtype=np.float32, name='observation')**
where self.shape[0] and self.shape[1] are dimension given in input (960,18).
This are the only "spec" definitions I have in my environment.
Do you mind to be a little bit more specific, please?
Sure. I was suggesting to modify your observation spec to:
# Note that we are removing the extra 1 at the front here.
ns=(self.shape[0],self.shape[1])
self._observation_spec = array_spec.BoundedArraySpec(
shape=ns, dtype=np.float32, name='observation')
And keep the actual observation data as what you had before.
Thats what I did as you suggested, and where I got the error I was talking about in my previous comment
"ValueError: Shapes (960, 18) and (1, 960, 18) are incompatible"
Are you able to provide code to your environment? Better if it's not too complicated. As I cannot reproduce the issue you're seeing in standard environments, it's a bit hard to debug from my end. Thanks!
Well, that s not possible as the environment needs a database and additional procedures to run. But it's a standard python environment wrapped into a TFEnvironment. No changes are made to the action_spec and observation_spec you have seen before. Just for my understanding (then I will stop bother you..): the error message points the difference in the dimension of (in Bold) Saw tensor_shapes: [TensorShape([1]), TensorShape([1]), TensorShape([1]), TensorShape([1, 960, 18])] And spec_shapes: [TensorShape([]), TensorShape([]), TensorShape([]), TensorShape([1, 960, 18])],
while the shapes of the observation are both ok for tensor_shapes and spec_shapes ([1, 960, 18] in both). This is, for me, something which is not related with the trained agent, but maybe in the policy saver or a wrapper used (like common.function), but my knowledge of this functions is quite limited Again thank you for your time
Sorry that the previous suggestions weren't as helpful as I wished. I think I might understand where the confusion is. Let me try again.
The error message shows that the received tensor shape and the spec shape to be "not compatible" - though the word compatible isn't very well defined. If you look closer at the code where it's erring out in nest_utils.is_batched_nested_tensors, you will notice that tensor_shapes and spec_shapes are not required to be exactly the same. Both cases below are considered compatible:
I might see why you think it's a policy saver or wrapper issue. It's possible. Maybe I didn't understand your issue very well. To clarify, in your code, before you save and reload the policy, if you just call evaluate on agent.policy
right after training, do you see the same issue? If not, it would point to a bug in PolicySaver. I think it is extremely unlikely that the common.function wrapper would change the spec dimensions.
Thank you for your answer. At the end I m using this workaround (in bold in the code. Not sure is great, but seems to work)
t_step = tf_environment.reset() t_step=tf.expand_dims(t_step.observation,axis=0) time_step = ts.restart(t_step, tf_environment.batch_size) state = policy.get_initial_state(tf_environment.batch_size) i=0 while not time_step.is_last(): policy_step: PolicyStep = policy.action(time_step, state) state = policy_step.state time_step = tf_environment.step(policy_step.action) time_step=tf.expand_dims(time_step.observation,axis=0) time_step = ts.restart(time_step,batch_size=tf_environment.batch_size) if (i%500==0): print(py_environment.render(),'Run:',i, 'Action', policy_step.action.numpy()) i+=1
But I remain frustrated for not really understanding what's the root problem.... Thank you for your patience
Hello, I'm trying to use a PPO tf-agent with a trained policy, but I get the following error
ValueError Traceback (most recent call last)