rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.01k stars 869 forks source link

[FEA] Reduce page faults when using managed memory #13821

Open GregoryKimball opened 11 months ago

GregoryKimball commented 11 months ago

Is your feature request related to a problem? Please describe. In cuDF-python and RMM, it's easy to opt into managed memory (also known as Unified Memory, UM, and Unified Virtual Memory, UVM). However, libcudf is not optimized for use with managed memory and encounters many "just too late" page faults when the "oversubscription factor" is >1.

Hinting options and strategies

Implementation ideas for libcudf

Useful reference for cudaMemAdvise

image




Please note: Using managed memory in libcudf is in early stages of scoping. This issue will improve over time.



Describe the solution you'd like I would like to add a libcudf benchmark for studying managed memory performance, and then some targeted experiments (with profiling) to observe the impact of different hinting strategies. When we have identified a promising design, we will open a more targeted issue.

Describe alternatives you've considered Continue to let Dask and Spark-RAPIDS catch and retry when there are device OOM errors.

Additional context Please note that with managed memory pools, the pool allocation is lazy. This is different from unmanaged memory pools where we allocate the full pool upfront, trading slightly longer startup time for much faster algorithm allocations.

Useful blog posts: https://developer.nvidia.com/blog/unified-memory-cuda-beginners/ https://developer.nvidia.com/blog/improving-gpu-memory-oversubscription-performance/ https://developer.nvidia.com/blog/maximizing-unified-memory-performance-in-cuda/ https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/

revans2 commented 11 months ago

The other thing to think about is what happens if you are not using UVM. Putting hints in is nice, but are they going to slow down the processing when UVM is not being used? If they do how is the best way to mitigate this?

bdice commented 11 months ago

@revans2 I do not expect that this would affect non-managed allocations. My expectation is that cudf will need to determine (or track) if an allocation is managed before attempting prefetching or giving other advice to the driver. That should be very inexpensive to check/track. Non-UVM cases shouldn’t see regressions as a result.

wence- commented 11 months ago

It seems like the right object to offer the ability to hint allocations is the memory resource, in which case non-managed memory resources could provide no-op implementations.