We can make the Flowvar result channel intrusive to the task.
Assuming we have in a full message passing environment, this is equivalent to sending back the task to the waiter once it is finished hence we are compatible with one of Weave's design goal.
With this, we might be able to completely remove the memory folder and the 2 levels caching structure via memory pool + lookaside list. This might accelerate machine learning algorithm like GEMM / Matrix multiplication as those are very cache sensitive and our memory pool does trigger many page faults.
This is an interesting idea. Any preliminary measurements showing the difference in behavior in all/many tasks from the weave benchmark set (not just for cryptography computations)?
For a fresh implementation of a threadpool for high-speed cryptography (https://github.com/mratsim/constantine/tree/1dfbb8b/constantine/platforms/threadpool), I found a new design for low overhead memory management:
Assuming we have in a full message passing environment, this is equivalent to sending back the task to the waiter once it is finished hence we are compatible with one of Weave's design goal.
With this, we might be able to completely remove the memory folder and the 2 levels caching structure via memory pool + lookaside list. This might accelerate machine learning algorithm like GEMM / Matrix multiplication as those are very cache sensitive and our memory pool does trigger many page faults.