Closed ychenco closed 4 years ago
@ychenco Hi! Sorry for the late reply. I am not sure that I understood your question correctly, but this function is making sure to sample the trajectory only in the same episode. That is because the consistency of the sequential state-action relationship will be broken by crossing the episodes. I have not checked whether this affects the performance or not, but my guess is that sampling the episode crossed trajectory will worsen the performance.
https://github.com/yusukeurakami/dreamer-pytorch/blob/7e9050e8c454309de40bd0d1a4ec0256ef600147/memory.py#L33-L39
The sampling function seems not consider the cases that the sampled sequence could cross episodes? Will that be an influence on the performance?