There is a bug in how episodes are saved vs how they are retrieved in the SeqReplayBuffer class. Episodes are stored according to their actual length, but are retrieved based on their maximum length.
For example, let's say my environment has a maximal episode length is 100, and terminates upon success. My agent is doing well and ending subsequent episodes after 20 and 30 steps. These episodes are stored in the buffer directly after each other, and the third episode will start at index 50.
However, when sampling episodes, the random_episodes method does not account for this, and assumes in its first for loop that every episode is the maximum episode length (self._sampled_seq_len). When the first episode is sampled, it will contain indices 0-100, and will thus also contain episode 2 and 3.
I propose this can be fixed by changing the incrementing of self._top and self._size in the add_episode method. In these lines, change + seq_len to + self._sampled_seq_len
Hi, I don't think there is a bug here. The sampling method will also return masks to indicate if the item is valid. Therefore, the agent is only trained on the first episode, and ignores the others.
There is a bug in how episodes are saved vs how they are retrieved in the
SeqReplayBuffer
class. Episodes are stored according to their actual length, but are retrieved based on their maximum length.For example, let's say my environment has a maximal episode length is
100
, and terminates upon success. My agent is doing well and ending subsequent episodes after20
and30
steps. These episodes are stored in the buffer directly after each other, and the third episode will start at index50
.However, when sampling episodes, the
random_episodes
method does not account for this, and assumes in its firstfor
loop that every episode is the maximum episode length (self._sampled_seq_len
). When the first episode is sampled, it will contain indices0-100
, and will thus also contain episode 2 and 3.I propose this can be fixed by changing the incrementing of
self._top
andself._size
in theadd_episode
method. In these lines, change+ seq_len
to+ self._sampled_seq_len