[RLlib] Enable Training from Replay Buffer Larger than Memory

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

33.05k stars 5.59k forks source link

[RLlib] Enable Training from Replay Buffer Larger than Memory #23816

Open fwitter opened 2 years ago

fwitter commented 2 years ago

Description

The current implementation of ReplayBuffer uses Python's list to store samples in memory. If the required data does not fit into memory, training is not possible (take CQL as an example). An alternative data structure that stores samples on disk instead of in memory could remove this limitation.

Use case

This might be useful in situations:

When using Offline RL with a large dataset
When the use case has large observations
When memory is very limited

I am willing to submit a PR.

ArturNiederfahrenhorst commented 2 years ago

Hi @fwitter ,

Wow, really cool idea! If you would like to submit a PR, I suggest the following:

Subclass from ReplayBuffer (not any of the old buffers)
Use SimpleQ and put your modified ReplayBuffer inside the experimental replay_buffer_config alongside with the needed arguments to see if everything is working fine

Please contact me if you want help or join the Ray community slack if you want to chat about this.

fwitter commented 2 years ago

@ArturNiederfahrenhorst Great and thanks! I will submit a first draft soon.

My idea is not to subclass from ReplayBuffer but to introduce a new storage interface which is used by ReplayBuffer. This way, the storage location is independent of the sampling strategy used in the ReplayBuffer.

ArturNiederfahrenhorst commented 2 years ago

Under the hood all replay buffer write/read to/from self._storage, which is a simple list. And they all inherit from ReplayBuffer, which sets self._storage.

So if you could do this as a MixIn-type of class, from which you inherit to make a ReplayBuffer behave like you describe, that would be awesome!

class DiskExpandableMixIn:
     (at)property
     def _storage():
            (...)
 (...)

class MyCoolReplayBuffer(MultiAgentReplayBuffer, DiskExpandableMixIn):
   (...)

This is how I think it would be great to do what you are doing. It enables us to basically expand all existing buffers easily.

stale[bot] commented 2 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.