pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.2k stars 157 forks source link

modular Replay Buffer #160

Closed HeChengHui closed 3 years ago

HeChengHui commented 3 years ago

Hi, is it possible to use PrioritizedReplayBuffer just by itself in my own DQN code? I would like to be able to append my own tuple of tensors to the replay buffer and sample from the buffer. From what I can find, https://github.com/pfnet/pfrl/blob/master/pfrl/replay_buffers/prioritized.py provides only has a sample function for the class and not an append function.

prabhatnagarajan commented 3 years ago

The PrioritizedReplayBuffer inherits the append method from ReplayBuffer. This can be seen in the class definition: class PrioritizedReplayBuffer(ReplayBuffer, PriorityWeightError):

HeChengHui commented 3 years ago

Thank you for the reply. Do you know why would there be an assertion error when i am trying to sample from the buffer?

Traceback (most recent call last):
  File "Pytorch_DQN_4frames_adaptiveE.py", line 595, in <module>
    experiences = memory.sample(batch_size)
  File "D:\STUDIES\Anaconda\envs\ESP3201\lib\site-packages\pfrl\replay_buffers\prioritized.py", line 119, in sample
    sampled, probabilities, min_prob = self.memory.sample(n)
  File "D:\STUDIES\Anaconda\envs\ESP3201\lib\site-packages\pfrl\collections\prioritized.py", line 99, in sample
    assert not self.wait_priority_after_sampling or not self.flag_wait_priority
AssertionError
WARNING: 'Pytorch_DQN_4frames_adaptiveE' controller exited with status: 1.
prabhatnagarajan commented 3 years ago

No I don't. All I can tell really is the same thing as you, that

assert not self.wait_priority_after_sampling or not self.flag_wait_priority is false. Maybe it has to do with the specific way you're using it?

HeChengHui commented 3 years ago

Thank you for your fast response. It works the first time I sample from the buffer but got this error on the next episode. The following is how I initialise and use it:


memory = pfrl.replay_buffers.PrioritizedReplayBuffer(capacity=memory_size)
in an episode:
    state = ...
    action = ...
    reward = ...
    next_state = ...
    memory.append(state=state, action=action, reward=reward, next_state=next_state)
        if len(memory) >= batch_size:
            experiences = memory.sample(batch_size)
.
.
.
HeChengHui commented 3 years ago

@prabhatnagarajan Any updates?

prabhatnagarajan commented 3 years ago

Unfortunately, it's difficult for me to know what your error/issue is given the information you've given me. A few things to consider. I hope you've look at this file (where apparently your assert is false): https://github.com/pfnet/pfrl/blob/master/pfrl/collections/prioritized.py

Additionally, be sure that the agent you are using is compatible with prioritized experience replay. For example, take a look at the DQN agent (https://github.com/pfnet/pfrl/blob/master/pfrl/agents/dqn.py) which is built to support prioritized buffers. Prioritized buffers can't just be used interchangeably with normal replay buffers without having the agent support it. From your original message it seems like you're implementing your own DQN? If so, I would recommend looking at PFRL's DQN implementation (which I linked) to get some idea of how to modify your DQN implementation to support a prioritized buffer.

HeChengHui commented 3 years ago

Yes, I am implementing my own DQN. Thank you for the suggestions, I shall see what I can do.

muupan commented 3 years ago

The assertion https://github.com/pfnet/pfrl/blob/edc5a35a0d5ffb86ec41f4ae3a9a10c5ab2c6be6/pfrl/collections/prioritized.py#L99 enforces users to call set_last_priority after every call of sample to prevent users from forgetting to update priorities.

This restriction can be turned off by setting wait_priority_after_sampling to False. https://github.com/pfnet/pfrl/blob/edc5a35a0d5ffb86ec41f4ae3a9a10c5ab2c6be6/pfrl/collections/prioritized.py#L26