Execution policy support for all containers

As the containers should mimic their C++ counterparts as close as possible in terms of functionality, both per-element and iterator-based member functions are considered and provided. While the former allow for easy usage in the native context, that is e.g. in CUDA kernels for the CUDA backend, the latter iterator-based versions can be considered following algorithm semantics. However, they lack support for execution_policys prohibiting greater flexibility such as using asynchronous CUDA streams. The affected functionality is listed below:

[x] ~All containers:~
- ~createDeviceObject and destroyDeviceObject~
[x] ~bitset:~
- ~set, reset, flip, count, all, any, none~
[x] ~deque:~
- ~clear, device_range, valid~
[x] ~memory:~
- ~createDeviceArray, destroyDeviceArray, and for symmetry reasons also the respective host versions~
[x] ~mutex:~
- ~valid~
[ ] queue:
- valid
[ ] stack:
- valid
[x] ~unordered_map, unordered_set:~
- ~device_range, insert, erase, clear, valid~
[x] ~vector:~
- ~insert, erase, clear, valid~

Option 1: Add a respective execution_policy parameter to all of these functions. This could either follow algorithm and make this the first parameter such that each functions must be duplicated. Alternatively, it could be passed as the last parameter with a default value, at the cost of an inconsistent interface to algorithm.

Option 2: Add a scoped_execution_policy class which acts as a customizable default policy for all calls within its scope. While this minimizes the required changes for the containers, proper global management may be hard to implement as the class types of the policies could theoretically be arbitrary.

stotko / stdgpu

Execution policy support for all containers #351