Open bmyerz opened 12 years ago
Maintaining multiple queues has a lot of additional work and complexity. In particular, finding out what to schedule next gets all messed up. Unless we were doing multiple queues in order to coordinate scheduling differently, it doesn't seem worth it.
I'd say stick with encoding the size of arg elements into the queue. Stealing is relatively infrequent. We may be able to encode the arg size in the fn pointer, possibly by registering the task at compile time as you suggest, but maybe also by exploiting extra bits and doing some standard conversion to turn them back into real fn pointers.
Revising this due to Alexandro's question. I agree with Holt-from-the-past---stealing is infrequent, so adding overhead there should be acceptable.
I think it would be straightforward to modify our current steal code to handle variable-size tasks by traversing the task queue locally during a steal request in order to find the size of the chunk to memcpy. However, supporting variable-size tasks would require changing the local task interface as well
Another option would be to keep 32-byte tasks but add an indirection flag, so that for >32 byte tasks the task struct on the queue would contain a pointer to the actual task struct, allocated locally. Then stealing would have to be two phases---grab the queue region, and then grab any of the overflow structs. The only challenge would be making allocation/deallocation work efficiently.
Some future stealing system that synchronized with memory operations rather than active messages might want to do it transactionally, grabbing a memory region ignoring task sizes, and then detecting whether we chopped off a task at the end and fixing that up.
Currently, tasks are forced to be a fixed size of 64 bytes: 64-bit function pointer, 3 64-bit arguments. This restriction is to make the task queue simple. We could support tasks of different sizes with perhaps little additional complexity. The gain would be flexibility to have larger tasks, space and copying efficiency for small tasks.
In the designs so far, work stealing in chunks efficiently tends to be the main challenge. We would like to do something that looks like a memcpy; on the other hand it's still linear and traversing the queue might not be that much more expensive.
designs:
All the function pointers passed to spawn() functions are known at compile time, and certainly at the start of the program. These can be registered with the task queue to create appropriately sized queues.