Is your feature request related to a problem? Please describe.
In experiments we are seeing value to prefetch-on-allocate for reducing page faults on initial write of an allocation. This is especially important in undersubscribed workloads where the entire pool could be prefetched when it is first allocated and when it grows.
Describe the solution you'd like
Add an adaptor that calls cudaMemPrefetchAsync on each allocation.
Describe alternatives you've considered
We have implemented this using the Python CallbackMemoryResource, but it may be beneficial to have it in C++.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
Is your feature request related to a problem? Please describe. In experiments we are seeing value to prefetch-on-allocate for reducing page faults on initial write of an allocation. This is especially important in undersubscribed workloads where the entire pool could be prefetched when it is first allocated and when it grows.
Describe the solution you'd like Add an adaptor that calls
cudaMemPrefetchAsync
on each allocation.Describe alternatives you've considered We have implemented this using the Python CallbackMemoryResource, but it may be beneficial to have it in C++.
Additional context Add any other context, code examples, or references to existing implementations about the feature request here.