pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.19k stars 289 forks source link

[Feature Request] Implement Efficient Ops to Compute Returns #140

Open xiaomengy opened 2 years ago

xiaomengy commented 2 years ago

In RL algorithms, it is every common to compute return-like things from trajectories. It is inefficient to compute such returns with normal python for-loops. In order to improve efficiency, we'd better to abstract some general return ops and implement efficient PyTorch kernels. Some examples of such returns are listed below.

  1. TD-lambda return
  2. GAE in PPO
  3. ...
vmoens commented 2 years ago

Great suggestion So far we have implemented a vectorized version of lambda return that works several orders of magnitude faster than regular for-loops without requiring c++ code. I'd love to see how much we can gain by designing a c++ kernel for that if that's what you have in mind.

Here are the things I would keep in mind in this endeavour:

  1. We'd like the resulting functions to be compatible with functorch. That means that operations should be -- at least -- twice differentiable.
  2. I think it's ok that we use C++ to optimize things but knowing how researchers work, we should keep not-optimized functions as well: people may want to copy paste them and tweak them at will.