thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.83k stars 1.12k forks source link

Basepolicy requires replay buffer restricting modularity. #898

Open llewynS opened 1 year ago

llewynS commented 1 year ago

Not really a coding issue. You assert in your paper that the code base is highly modular but the algorithms are very strongly tied to your implementation of a replay buffer. All of the static methods in the base policy require the replay buffer.

It would be non-trivial to decouple the algorithmic implementations from the replay.

I would like to recommend that you rename learn to _learn. learn without the underscore implies that it is a stand alone method that can be publicly called but, at least for DDPG and child classes, learn requires everything surrounding it in update.

if buffer is None:
    return {}
batch, indices = buffer.sample(sample_size)
self.updating = True
batch = self.process_fn(batch, buffer, indices)
result = self.learn(batch, **kwargs)
self.post_process_fn(batch, buffer, indices)
if self.lr_scheduler is not None:
    self.lr_scheduler.step()
self.updating = False
return result

The docstring of learn is also not indicative of this limitation.

MischaPanch commented 1 year ago

The learn method will be changed significantly in the upcoming releases, tightening it's interface and simplifying the way in which it is called by the trainer. The dosctrings will change accordingly. Thanks for raising the question about modularity and independence on the buffer implementations! I will keep it in mind