[DOC] Add best practices/advice with respect to using pool allocators

rapidsai / rmm

RAPIDS Memory Manager

https://docs.rapids.ai/api/rmm/stable/

Apache License 2.0

495 stars 198 forks source link

[DOC] Add best practices/advice with respect to using pool allocators #1694

Open wence- opened 1 month ago

wence- commented 1 month ago

RMM has multiple pool-like allocators:

a pool_memory_resource that wraps a coalescing best fit suballocator around an upstream resource;
an arena_memory_resource that similarly wraps around an upstream resource but divides the global allocation into size-binned arenas to mitigate fragmentation when allocating/deallocating;
and a cuda_async_memory_resource that uses the memory pool implementation provided by cudaMallocAsync. This one can avoid fragmentation because it is in control of the virtual address space.

Since these are all composable, one can happily wrap a pool_memory_resource around a cuda_async_memory_resource (or an arena, ...). But should one?

It would be useful if the documentation provided some guidance on which combinations make sense, and what typical allocation scenarios best fit a particular pool.

We should also recommend best practices for picking initial pool sizes: a bad choice here can lead to overfragmentation.

harrism commented 1 month ago

Side thought: Maybe we should experiment with replacing the cuda_memory_resource used for initial_resource with a cuda_async_memory_resource...

wence- commented 1 month ago

Side thought: Maybe we should experiment with replacing the cuda_memory_resource used for initial_resource with a cuda_async_memory_resource...

Maybe, though we'd have the usual static destruction problems, so we'd never explicitly free that memory pool.

It might also be problematic in the multiple library case where one library is not configured with a specific pool, so makes an allocation from the initial_resource which now builds a pool. And now we're conflicting (potentially) with other libraries that then set a pool up.

harrism commented 1 month ago

I was thinking by default the async resource uses the default pool, which we would not own. Maybe I'm misremembering how it's implemented.

vyasr commented 1 month ago

The async resource will use the pool managed by the CUDA driver, which we do not own and would probably be fine. Ideally everyone would use that and then all pooling would be handled by the driver. If we use the async mr by default and a different library does not but constructs their own pool manually using a different underlying allocation routine (e.g. cudaMalloc instead of cudaMallocAsync), then we could conflict.

wence- commented 1 month ago

In cuda_async_memory_resource we call cudaMempoolCreate and get a handle to a pool and use that to make our allocations. So that sounds like we own that pool.

vyasr commented 1 month ago

My mistake, I didn't realize that we were allocating from a specific pool that we created. The failure mode should still be relatively graceful if two processes both use the async allocation routines and one pool blocks another's growth. I don't think it will be as graceful if you mix and match async with non-async allocation, but I could be wrong there.

harrism commented 1 month ago

I believe the reason that cuda_async_memory_resource owns its pool is because we provide a non-owning MR: cuda_async_view_memory_resource. We could use the latter as the default resource with the default pool. That MR requires passing a cudaMemPool_t pool handle, and the default pool handle can be retrieved using cudaDeviceGetDefaultMempool.

Perhaps, however, we should wait to make this default change until we can start using the cuda_async_memory_resource from libcu++, which has a different design.