pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.05k stars 273 forks source link

[BUG] `LazyTensorStorage` allocates storage on wrong device #2187

Closed matteobettini closed 1 month ago

matteobettini commented 1 month ago

https://github.com/pytorch/rl/blob/d59b8105cd7437cccb7c480f15ce6caf67f587c6/torchrl/data/replay_buffers/storages.py#L870-L873

As can be seen in the snippet, the storage first allocates everything on the data device and only later moves it to the proper device.

This is problematic as, if i request

return TensorDictReplayBuffer(
            storage=LazyTensorStorage(
                100_000_000, # Say this is 30GB
                device="cpu",
            ),
            sampler=sampler,
            batch_size=100,
        )

it allocates the 30GB first on the data device (gpu) and then later moves it to cpu