Although I haven't confirmed or tested this, there may be a threading risk. The ReplayBuffer initiates a separate thread for prefetching, which will call PrioritizedSliceSampler.sample().
In PrioritizedSliceSampler.sample(), we have:
# force to not sample index at the end of a trajectory
self._sum_tree[preceding_stop_idx] = 0.0
# and no need to update self._min_tree
starts, info = PrioritizedSampler.sample(self, storage=storage, batch_size=batch_size // seq_length)
If the main thread calls update_priority() between these two lines, it might update _sum_tree, causing self._sum_tree[preceding_stop_idx] = 0.0 to fail and then sample the end of a trajectory as the slice start.
I'm uncertain of the role of buffer._futures_lock, but it doesn't seem to prevent this conflict.
Additional context
See discussion in the original issue #2205.
Checklist
[x] I have checked that there is no similar issue in the repo (required)
Describe the bug
This issue comes from the original issue #2205.
Although I haven't confirmed or tested this, there may be a threading risk. The ReplayBuffer initiates a separate thread for prefetching, which will call
PrioritizedSliceSampler.sample()
.In
PrioritizedSliceSampler.sample()
, we have:If the main thread calls
update_priority()
between these two lines, it might update_sum_tree
, causingself._sum_tree[preceding_stop_idx] = 0.0
to fail and then sample the end of a trajectory as the slice start.I'm uncertain of the role of
buffer._futures_lock
, but it doesn't seem to prevent this conflict.Additional context
See discussion in the original issue #2205.
Checklist
cc @xiaomengy