Open yiwei-prowler opened 6 years ago
Actually, I run across the same problem. It seems that the PR didn't merge into master branch.
To reproduce the result:
>>> from baselines.deepq.replay_buffer import PrioritizedReplayBuffer
>>> replay_buffer = PrioritizedReplayBuffer(2, alpha=0.6)
>>> replay_buffer.add(*tuple(range(5)))
>>> replay_buffer.sample(1)
>>> replay_buffer.sample(1, 0.4)
this problem is due to this line: https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py#L109
It seems to me that the buffer never samples the last added experience record. The code below shows that: Only the index of the first experience record are in the set, the second record (the one with all values equal to 2 never get sampled:
from baselines.deepq.replay_buffer import PrioritizedReplayBuffer
buf = PrioritizedReplayBuffer(10, 0.5) buf.add(1, 1, 1, 1, False) buf.add(2, 2, 2, 2, False) idxset = set([]) for in range(1000): , , , , , , idx = buf.sample(1, .5) idx_set.add(idx[0])
assert len(idx_set) == 2