openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.63k stars 4.86k forks source link

Prioritized replay buffer does not sample the last experience record #171

Open yiwei-prowler opened 6 years ago

yiwei-prowler commented 6 years ago

It seems to me that the buffer never samples the last added experience record. The code below shows that: Only the index of the first experience record are in the set, the second record (the one with all values equal to 2 never get sampled:

from baselines.deepq.replay_buffer import PrioritizedReplayBuffer

buf = PrioritizedReplayBuffer(10, 0.5) buf.add(1, 1, 1, 1, False) buf.add(2, 2, 2, 2, False) idxset = set([]) for in range(1000): , , , , , , idx = buf.sample(1, .5) idx_set.add(idx[0])

assert len(idx_set) == 2

robotcator commented 5 years ago

Actually, I run across the same problem. It seems that the PR didn't merge into master branch.

To reproduce the result:

>>> from baselines.deepq.replay_buffer import  PrioritizedReplayBuffer
>>> replay_buffer = PrioritizedReplayBuffer(2, alpha=0.6)
>>> replay_buffer.add(*tuple(range(5)))
>>> replay_buffer.sample(1)
>>> replay_buffer.sample(1, 0.4)

this problem is due to this line: https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py#L109