habanero-rice / hclib

A C/C++ task-based programming model for shared memory and distributed parallel computing.
http://habanero-rice.github.io/hclib/
BSD 3-Clause "New" or "Revised" License
71 stars 35 forks source link

Add vector of dependencies to tasks. #64

Closed srirajpaul closed 6 years ago

srirajpaul commented 6 years ago

Supports a maximum of 4 futures as dependency. To change maximum limit adjust the MAX_NUM_WAITS macro inside hclib-promise.h and rebuild HClib.

srirajpaul commented 6 years ago

Max, I am not quiet sure how the vector based functions can be implemented by calling the non-vector-based cousins in this same file. Say for example, if we compile with MAX_NUM_WAITS=7 since we had 7 dependencies for our 3D stencil, there are no non-vector based function that accepts 7 futures. Or we need to create Variadic templates, and iterate over them and create an array and so on to support it. Current implementation seems to be much more readable/concise than that. Also getting array pointer directly from vector is efficient/constant time whereas creating array for vardic function is linear and not so efficient.

A) std::vector<hclib_future_t > &futures is lvalue reference and B) std::vector<hclib_future_t > &&futures is rvalue reference. An example would be, vector<hclib_future_t > fut_vec{f1, f2, f3}; async_await(...,fut_vec); will invoke A whereas, async_await(..., vector<hclib_future_t >{f1, f2, f3}); would invoke B

Prior to c++11, there was no rvalue reference and therefore such invocations used call by value and therefore was expensive due to the copy.

There is a way to combine them which is the universal reference which can take both lvalue reference and rvalue reference (similar to T&& lambda). But we need to convert 'vector<hclib_future_t > &futures' to something like 'V &&futures', where V is a template parameter and may be use std::enable_if to tell that V is of type std::vector<hclib_future_t >. Do you think converting to universal reference is a better option?

agrippa commented 6 years ago

Sriraj, so why not just add some non-vector-based routines that both the vector- and non-vector-based routines call (thereby removing the copy-paste effect)?

For example, you've duplicated async_future_await a couple of times. Why not just add:

async_future_await(T&& lambda, hclib_future_t** futures)

and have both the existing and new routines call this directly, basically turning both new and existing into one-liners sharing a common implementation? It seems like you should be able to do this for all of the added code.

Let's first make that change, then see what that collapses the rvalue/lvalue versions down to and see if the code sprawl is sufficiently mitigated to leave them both in place.

srirajpaul commented 6 years ago

Added the helper function that takes an array of dependencies and the new routines go through it. Old routines can also use this helper function, but prefer not to make changes to existing routines in this pull request.

agrippa commented 6 years ago

These changes look good! Just one additional question I added.

budimlic commented 6 years ago

I'm OK with this pull request as a quick fix to allow tasks to wait on more than 4 futures .

However, it seems like we should now (with the APIs added by Sriraj allowing a vector of futures as an argument of async_await) be able to get rid of the MAX_NUM_WAITS completely. It's only used in hclib_task_t struct to allocate a NULL-terminated array of futures on which the task is waiting on. If this is done for performance reasons (I suspect it is), we could keep doing it for a fixed number of futures, then store any additional future pointers in a dynamically allocated array if needed. This would require storing the number of futures on which the task awaits into the hclib_task_t struct, which I would prefer to the NULL-terminated array kludge we have now anyway.

Requiring a user to recompile hclib to allow for more than 4 futures seems very counter-intuitive now that we have a vector-of-futures API.

agrippa commented 6 years ago

@budimlic This change doesn't actually change anything in regards to how many futures we can wait on (that's a compile-time constant before this change, and after). It just adds API wrappers that use std::vector.

Using a dynamically allocated array extension is certainly something someone could submit as the next pull request.

agrippa commented 6 years ago

@budimlic If you're ready, we've both signed off now and this can be merged in.

budimlic commented 6 years ago

@agrippa Unless I'm missing something obvious, there was no way until now to create an async_await that depends on 5 futures, even if I compiled with MAX_NUM_WAITS=5

agrippa commented 6 years ago

@budimlic You would just use the C APIs.

budimlic commented 6 years ago

Good point. So this change brings up the expressiveness of the C++11 API for async_await up to the level of expressiveness of the C API, at the cost of exposing the MAX_NUM_WAITS to the C++11 programmer. Since MAX_NUM_WAITS is already exposed to the C programmer, there is no downside to this change.

Getting rid of MAX_NUM_WAITS would be a topic of separate pull request.