Closed thisiscam closed 7 years ago
Every worker must always eventually call the yield macro in Lace.
Both options "should" work imo. But try to avoid these kind of locks, they hurt your parallel performance, especially if you hold them longer. We've had tremendous issues with them in our LTSmin model checker, since one of the language implementations required a spinlock, and this is precisely why we added this to the README.
Small note: I would probably have lace_lock_region as a TASK because you don't need to go through LACE_ME then; LACE_ME is not empty code, it has to search for the head of the queue (binary search).
I'm trying to call this function from C# code, which means that it cannot be a task(I can't think of a better way)
Also, just for confirmation: the reason that "it will hurt performance" is because of more frequent task-switching right? lace_steal should supposedly let the waiting threads do work, so there should not be busy waiting?
The performance is hurt because there are more memory transfers, especially on cachelines that are touched by other caches as well.
Basically for performance you have to look at all the potential costs from memory transfers. There very little computation for the processor to be done, the overhead by the framework to run tasks is mostly due to memory transfers. Basically as soon as we steal a task, the owner may be often inspecting the Task struct to see if the task has been completed, so whenever we change something in the Task struct that is not signalling that the task has been completed, we're wasting time both for the thief and for the owner in the memory transfers...
BTW a TASK is just a macro around some function declarations. You can simply turn your C# function into a "task" by adding two parameters as 1st and 2nd parameter: WorkerP* __lace_worker and Task* __lace_dq_head. This is technically undocumented, though, that's why I use a macro for this, so if I change it in lace.h/lace.sh I don't have to go and change every single task function by hand.
In the README it clearly states that I cannot use BDD operation in critical section because of GC. However, in my application I do need to enter some critical section. I'm trying to write a very simple spin lock specialized for lace to do the work. Consider the following code:
I think both of the above options should work as lace_steal should effectively pick up the GC's barrier of some other thread, who is holding "mutex"? However, it seems to that both of options the above causes deadlock.