Closed matteobettini closed 1 month ago
https://github.com/pytorch/rl/blob/d59b8105cd7437cccb7c480f15ce6caf67f587c6/torchrl/data/replay_buffers/storages.py#L870-L873
As can be seen in the snippet, the storage first allocates everything on the data device and only later moves it to the proper device.
This is problematic as, if i request
return TensorDictReplayBuffer( storage=LazyTensorStorage( 100_000_000, # Say this is 30GB device="cpu", ), sampler=sampler, batch_size=100, )
it allocates the 30GB first on the data device (gpu) and then later moves it to cpu
https://github.com/pytorch/rl/blob/d59b8105cd7437cccb7c480f15ce6caf67f587c6/torchrl/data/replay_buffers/storages.py#L870-L873
As can be seen in the snippet, the storage first allocates everything on the data device and only later moves it to the proper device.
This is problematic as, if i request
it allocates the 30GB first on the data device (gpu) and then later moves it to cpu