ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.02k stars 5.78k forks source link

SharedMemory - is it pinned (non-pageable)? #40256

Open janezales opened 1 year ago

janezales commented 1 year ago

Description

SharedMemory - is it pinned (non-pageable)? And if not add a flag for pinning a tensor in shared memory in RAM

When using Ray SharedMemory for pushing/pulling to/from a GPU (TPU, etc.) user should have the ability to PIN this memory in RAM, that is, avoiding storing it in pageable memory and then copying it into pinned (non.pageable) memory (which is what cuda does), in the process speeding up copy operations between RAM and GPUs.

Use case

If not done yet, this would speed up your mem copy to/from Shared to GPUs for a factor of about 2. In the case of large model tensors, this is worth doing because there are many elements, In the case of training data, this is worth doing because you are doing many repetitions...

stephanie-wang commented 11 months ago

I think this is more of a Ray Data issue, not Ray Core. Ray Data uses Ray Core's shared memory to store outputs (DataIterator/Trainer inputs)

Ray Data currently does not pin memory. It might not make sense to do so for shared memory right now because we store Arrow blocks, not the final tensor batches in shared memory (although this may change in the future, see #41571). Also, shared memory blocks are usually read-once (unless ds.materialize() is used).

But it should be easy enough to plug in something here. The way to do it would be to write a custom collate_fn that manages a pinned buffer and allocates the returned tensor batches there. Are you interested in contributing this? We'd be interested in seeing the results!