Open JCMiles opened 3 years ago
any update on this ?
Apologies; just saw this. What do you want to happen if your episode length goes over 20?
My episode cannot be greater than 20 steps by environment design. But can be less if the agent reaches its goal before step 20.
Is it possible this is a one-off error? Try setting the max_sequence_length to 21 exactly?
Alternatively if there's a bug in your env that causes it to sometimes go over 20 steps one way to enforce is to use a TimeLimit wrapper.
No this is not a one-off error. I've also built a custom rendering script for the environement to have full control over its internal parameters. So I can confirm that every episode is max 20 steps and I'm already using TimeLimit to control it. I'm still a bit confused about the parameter description I reported above. In the end in a ReverbAddEpisodeObserver 'max_sequence_length' represents the env steps or " the number of trajectories across all the cached episodes that you are writing into the replay buffer (e.g. number_of_episodes)." as it is mentioned in the description? cause to me those are complete different things. And last but not least, if it represents the episode steps why I have to set it to 20+1 to get things working ? thx for your time
I suppose the question I have is whether it's a bug on our end, which would be strange because we have plenty of environments that work just fine.
max_sequence_length is the length of the internal buffer used to send data to reverb, and has nothing to do with the environment.
so the question is: does setting it to 21 get things working or not? this can help us debug. though more helpful would be a small, self-contained example which causes the failure to happen. that'd make it much easier to debug!
Hi, could you please clarify the description of the parm max_sequence_length in ReverbAddEpisodeObserver. The description is a little bit messy.
max_sequence_length: An integer.
max_sequence_length
used to write to the replay buffer tables. This defines the size of the internal buffer controlling theupper
limit of the number of timesteps which can be referenced in a single prioritized item. Note that this is the maximum number of trajectories across all the cached episodes that you are writing into the replay buffer (e.g.number_of_episodes
).max_sequence_length
is not a limit of how many timesteps or items that can be inserted into the replay buffer. Note that, sincemax_sequence_length
controls the size of internal buffer, it is suggested not to set this value to a very large number. If the number of steps in an episode is more thanmax_sequence_length
, only items up tomax_sequence_length
is written into the table.In my case I have an episode with variable length and max step = 20
1) agent_server.py
2) aget_train.py
3A) agent_collect.py
with setup 3A I get this error iin agent_collect.py when adding the trajectories.
3B) agent_collect.py (e.g. 100)
with setup 3B (any number >= max_ep_length +1) the data collection runs fine but the experience in agent_train.py is sampled wrongly (batch_size, 21) instead of (batch_size, 20) and I get this error:
I previously tested my train pipeline in a non distributed way with a regular tf_agents StatefulEpisodicReplayBuffer and it worked as expected, so my guss is something is wrong with the setup of max_sequence_length or internally with something related to trajectory.is_boundary() because seams like I get a trajectory == to max_sequence_length + 1