ut-parla / parla-experimental

5 stars 0 forks source link

Suggestion: Move Taskspace slicing from Python to C++ #134

Closed wlruys closed 10 months ago

wlruys commented 1 year ago

Accessing a taskspace (for example T[3:10]) creates a list of Python handles to tasks. If used as a dependency or wait list, this list of Python handles is unwrapped to C++ Tasks and serialized in the backend before being passed to the runtime.

It may be possible to pass this list directly from a C++ backend of the taskspace to the runtime themselves. This would decrease launch overhead and latency.

Complications include making Python task creation lazy.

ShreyaTalati commented 10 months ago

Ran the experiments to see how much time the taskspace slicing occupies in the total execution time. The function __getitem__ is getting internally called for the taskpace slicing - https://github.com/ut-parla/parla-experimental/blob/0361ff0af9a726a2cf8eead125acf3a7bd09c27f/src/python/parla/cython/tasks.pyx#L1629. The numbers are as documented here - https://docs.google.com/document/d/1HDR1CLUGJeTYuCvqlPKpAOLe07Y-egh9v9fYH93MHH4/edit It is observed that getitem does not take significant amount of time. Thus, no improvements are required.