pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.12k stars 149 forks source link

Add memmap cache for Tensor #964

Open ejguan opened 1 year ago

ejguan commented 1 year ago

🚀 The feature

Beyond the on-disk cache and in-memory cache, it would be useful and performant if a memmap cache (under tensordict https://github.com/pytorch-labs/tensordict/blob/main/tensordict/memmap.py) It would boost better performance due to

However, there are two major limitations:

Motivation, pitch

Performance

Alternatives

No response

Additional context

No response

lennartclaas commented 1 year ago

It seems that memory mapping can mean many different things. Does the idea behind this issue correspond to https://www.mathworks.com/help/matlab/import_export/overview-of-memory-mapping.html ?

Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time?

ejguan commented 1 year ago

Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time?

Correct. This is inspired by tensordict to help accelerating MP.