Closed wlruys closed 10 months ago
Ran the experiments to see how much time the taskspace slicing occupies in the total execution time. The function __getitem__
is getting internally called for the taskpace slicing - https://github.com/ut-parla/parla-experimental/blob/0361ff0af9a726a2cf8eead125acf3a7bd09c27f/src/python/parla/cython/tasks.pyx#L1629.
The numbers are as documented here - https://docs.google.com/document/d/1HDR1CLUGJeTYuCvqlPKpAOLe07Y-egh9v9fYH93MHH4/edit
It is observed that getitem does not take significant amount of time. Thus, no improvements are required.
Accessing a taskspace (for example
T[3:10]
) creates a list of Python handles to tasks. If used as a dependency orwait
list, this list of Python handles is unwrapped to C++ Tasks and serialized in the backend before being passed to the runtime.It may be possible to pass this list directly from a C++ backend of the taskspace to the runtime themselves. This would decrease launch overhead and latency.
Complications include making Python task creation lazy.