Open HWerneck opened 3 years ago
Sorry bumping this up, but can't anyone help?
For observations, it seems like the replay buffer is expecting a batch of [1, 160, 260, 3]
elements of batch_size = 1
i.e. a tensor of dims [1, 1, 160, 260, 3]
.
But when you do trajectory.from_transition(t_step, action_step, next_t_step)
in collect_step()
, both t_step
and next_t_step
have observations of dims [1, 160, 260, 3]
.
As mentioned in the tutorial, using a driver will most likely solve these issues. For example, see how the DynamicEpisodeDriver
adds the batch dimension at https://github.com/tensorflow/agents/blob/v0.9.0/tf_agents/drivers/dynamic_episode_driver.py#L140.
Just a guess, looking at the error message.
I have this problem right now. Any news?
I was thinking it could it be the differing 'policy_info' field, but I just ran into more errors later on.
Hi,
In replay_buffer, you specified: data_spec=agent.collect_data_spec Or PPO agent and RandomTFPolicy agent have different data_spec. (PPO algorithm gather n_steps step/trajectories at each iteration)
As opposite to DQN agent, the PPO agent don't need to instantiate the replay buffer with trajectories build from random policy.
Hi,
In replay_buffer, you specified: data_spec=agent.collect_data_spec Or PPO agent and RandomTFPolicy agent have different data_spec. (PPO algorithm gather n_steps step/trajectories at each iteration)
As opposite to DQN agent, the PPO agent don't need to instantiate the replay buffer with trajectories build from random policy.
Hi, I have the same problem at the moment. indeed both me and HWerneck specify data_spec=agent.collect_data_spec in replay buffer and indeed it seems this is wrong. Shouldn't PPO agent automatically add n_steps dimention if it needs it? Is there a way to expand data_spec by hand?
hi, same problem here, really no idea how to fix it, can anyone give some hints?
I am building a PPO agent side by side with the TF-Agents DQN tutorial. The idea was checking the basics structures needed for a simple tf-agent to work, and adapting it to a PPO agent.
I am also using a custom environment, ViZDoom. It installs and is working properly.
I am getting an error when testing the "collect_data" function. This is the code I am running and following it, the error I get (full code at the bottom):
I don't really know how to proceed or what to try, I am really stuck. Does anybody have any idea why the trajectories are displaying different structures? By the way, why are there two trajectories? Is it like the trajectory that is created and the "mould", what it expects the trajectory structure to be?
Full code:
I do not have further code, since the application breaks here.