use CUDA zero-copy memory as a functional fallback for temporary buffer allocations

GoogleCodeExporter commented 8 years ago

Some of our algorithms allocate significant amounts of temporary memory.  For 
example, the sorting algorithms require temporary buffer of equal size to the 
input.

When such allocations fail we currently throw bad_alloc and cleanup any other 
temporary allocations that we've made.  In this case we could try to use 
zero-copy memory** on the host instead.  Performance would of course suffer 
greatly, but given the limited memory capacity of many GPU devices this 
fallback mechanism would be valuable.

Note that we could do somewhat better in the particular case of radix sort 
since it's safe to copy the keys to host memory, sort there, and then copy the 
sorted keys back (with a permutation vector for key-value sorts with non-POD 
values).  However, in the most general case it is necessary to perform the 
computation on the device with a device-addressable temporary buffer (since the 
operators may involve device pointers).

** on devices that support it

Original issue reported on code.google.com by wnbell on 13 Oct 2010 at 12:33

GoogleCodeExporter commented 8 years ago

This fallback mechanism should be configurable since some users will prefer to 
be notified of failed device allocations.  For instance, since the performance 
of the fallback path is likely to be dramatically lower some users will want to 
avoid non-deterministic performance cliffs.

Still, the using the host memory as virtual device memory is a practical way to 
ensure that algorithms "just work".

Original comment by wnbell on 13 Oct 2010 at 12:39

GoogleCodeExporter commented 8 years ago

This is probably better handled by a custom allocator, as in the Thrust 1.6 
example

Original comment by jaredhoberock on 7 May 2012 at 9:34

Changed state: WontFix

lion03 / thrust

use CUDA zero-copy memory as a functional fallback for temporary buffer allocations #250