[FEA] Add `host_buffer` class

jrhemstad commented 4 years ago

Is your feature request related to a problem? Please describe.

It has come up in several independent conversations w/ @jakirkham and others that it would be nice to have RMM provide a corollary to device_buffer for host allocated memory. See conversation in https://github.com/rapidsai/rmm/pull/141 and https://github.com/rapidsai/rmm/issues/216

In short, it would be convenient to have a common abstraction for host memory allocations used by RAPIDS libraries. This would allow for things like having pinned host memory allocations for use in more performant device to host memory spilling.

Describe the solution you'd like

Add a rmm::host_buffer class to act as a host memory corollary to device_buffer, i.e., untyped, uninitialized host memory allocation.

Additional context

Note that this opens a (few) sizable can of worms of questions that will eventually need to be answered. From my comment here

Would rmm::alloc be renamed to rmm::device_alloc and we add a rmm::host_alloc?
Does a host memory resource accept streams for alloc/free? If not, then host/device_memory_resource cannot share the same base class.
Do we enforce using RMM host memory resources anywhere host memory is being allocated in the same way we do with device memory? (e.g., are we going to provide a rmm::host_vector to replace std::vector?)
Are there alignment requirements for host allocations?
Will there be a separate default memory resource for host allocations?
Do we need host memory pools?
- If so, can we leverage C++17 memory pool implementations? https://en.cppreference.com/w/cpp/header/memory_resource

Here's what I think the simplest and least effort path forward is:

Provide a host_memory_resource base class mirrored from device_memory_resource.
Provide host_buffer that accepts a host_memory_resource* to use for allocation
Do NOT provide mirrors of the default device memory resource infrastructure (e.g., get_default_resource/set_default_resource())
Do NOT provide a mirror for rmm::alloc/free for host memory allocations
If a host memory pool is required, only support it in C++17 and beyond.

jakirkham commented 4 years ago

cc @kkraus14 @pentschev

kkraus14 commented 4 years ago

If a host memory pool is required, only support it in C++17 and beyond.

I assume this would be a crazy effort to backport to C++14? We typically get pushback from requiring newer compiler versions / system libraries.

jrhemstad commented 4 years ago

I assume this would be a crazy effort to backport to C++14? We typically get pushback from requiring newer compiler versions / system libraries.

Yeah, it's not really possible to backport. You'd be better off just re-implementing all of the memory pool logic or using some other open source host memory pool.

That said, we can probably insulate user libraries from needing C++17. We can wrap the C++17 bits in include guards and throw an error if someone tries to use the host memory pools pre-c++17.

Note that providing a host memory pool is orthogonal to providing a host_buffer.

harrism commented 4 years ago

I would leave host memory pools to future work.

kkraus14 commented 4 years ago

Yeah, it's not really possible to backport. You'd be better off just re-implementing all of the memory pool logic or using some other open source host memory pool.

That said, we can probably insulate user libraries from needing C++17. We can wrap the C++17 bits in include guards and throw an error if someone tries to use the host memory pools pre-c++17.

Note that providing a host memory pool is orthogonal to providing a host_buffer.

Yea I understand, just that there likely will be enterprise customers that want the memory pool and C++14 support, but beggars can't be choosers 😄.

jakirkham commented 4 years ago

Would using Boost help?

harrism commented 4 years ago

We don't want RMM to depend on Boost.

jakirkham commented 4 years ago

Yeah that makes sense. Just thinking about what alternatives we might have 🙂

jakirkham commented 4 years ago

What if we take this as an opportunity to start using the Conda compilers? That would put us on GCC 7.3.0, which has C++17 support (unless I've missed something). Besides this was something we were planning to do anyways. ( https://github.com/rapidsai/cudf/issues/1210 )

harrism commented 4 years ago

I think we should just stick to C++14, and leave host memory pools for future work.

kkraus14 commented 4 years ago

What if we take this as an opportunity to start using the Conda compilers? That would put us on GCC 7.3.0, which has C++17 support (unless I've missed something). Besides this was something we were planning to do anyways. ( rapidsai/cudf#1210 )

That's only for conda packages though. We have people wanting to build from source themselves without conda and can't necessarily guarantee a new enough compiler for C++17. We've already had pushback for C++14 😄.

jrhemstad commented 4 years ago

Agreed host memory pool is future work, just wanted to bring it up since we're talking about adding host memory management to RMM.

jakirkham commented 4 years ago

Ok makes sense. Thanks for the context and thanks for putting up with suggestions you likely have already considered 🙂

harrism commented 4 years ago

Can this issue be closed now that host_memory_resource exists and rmm::alloc/free are going to be dropped?

jrhemstad commented 4 years ago

We still need a host_buffer analog to device_buffer.

harrism commented 4 years ago

Ah yes, I got that confused.

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

randerzander commented 3 years ago

Still desired

jakirkham commented 3 years ago

(for context this came up today when discussing how to improve spilling and serialization performance)

cc @quasiben (for awareness)

jrhemstad commented 3 years ago

There's unlikely to be any movement here until https://github.com/NVIDIA/libcudacxx/pull/158 is complete in the next month or so.

jakirkham commented 3 years ago

Thanks for the update Jake 🙂

jrhemstad commented 6 months ago

This conversation is quite stale now, so I'll give a quick update on the status quo:

The rmm::(host/device)_memory_resource base class interface is on its way out in favor of the cuda::mr functionality
cuda::mr should be thought of as taking what we learned the last several years with RMM, generalizing it, and then centralizing it in libcu++
RMM is in the process of migrating to the new cuda::mr interface: https://github.com/rapidsai/rmm/pull/1095
Today, libcu++ only offers the interface. It doesn't offer any concrete implementations nor does it offer any data structures or containers that use a memory resource for allocation. This is admittedly a bit limited today.
Our immediate next steps are to 1) Add concrete memory resource implementations, 2) Create allocators, data structures, and containers that use a cuda::mr resource for allocation. Step 2 would ultimately satisfy the original request of having a simple host_buffer and device_buffer classes. Instead, we'd likely have cuda::buffer<device_accessible> and cuda::buffer<host_accessible>.

harrism commented 6 months ago

Would you distinguish between async and synchronous host_buffer and device_buffer classes (so there would be 4 classes)? Or would you have two classes with both sync and async methods? Or would host_buffer always be synchronous and device_buffer always async?

rapidsai / rmm

[FEA] Add `host_buffer` class #260