We used MicroLocks in a few places (not that many to be honest). The way a MicroLock works is that the two lower bits are used to encode the state of the lock. The lowest one 0x1 is used to say if someone holds the lock or not. The second bit 0x2 is used to say if someone is waiting on that lock, if that is the case, they should be awaken the moment the lock is released.
I would have replaced those locks with std::mutex, but the problem is that there are quite a few places in the runtime (cf, Invocation, Memoize etc ...) making assumptions about the layout of the lock, and expects sizes to be the same etc ...
So what I did instead is that I replaced the MicroLock with a SpinLock. The idea is relatively simple, you only use the lowest bit to say if the lock is held, and then spin while it is not available (with a little help to the scheduler, notifying that we are spinning).
I have not observed any difference in performance on the code of the compiler itself, except for one, the Invocation lock, that causes a small regression (which is not very surprising, an invocation can take time).
Ideally, we should revisit that at some point and replace the SpinLock with a real lock in the places where it matters. But I am going to leave this like that for now and revisit it later. The small regression (~< 5%) is acceptable ...
We used MicroLocks in a few places (not that many to be honest). The way a MicroLock works is that the two lower bits are used to encode the state of the lock. The lowest one 0x1 is used to say if someone holds the lock or not. The second bit 0x2 is used to say if someone is waiting on that lock, if that is the case, they should be awaken the moment the lock is released.
I would have replaced those locks with std::mutex, but the problem is that there are quite a few places in the runtime (cf, Invocation, Memoize etc ...) making assumptions about the layout of the lock, and expects sizes to be the same etc ...
So what I did instead is that I replaced the MicroLock with a SpinLock. The idea is relatively simple, you only use the lowest bit to say if the lock is held, and then spin while it is not available (with a little help to the scheduler, notifying that we are spinning).
I have not observed any difference in performance on the code of the compiler itself, except for one, the Invocation lock, that causes a small regression (which is not very surprising, an invocation can take time).
Ideally, we should revisit that at some point and replace the SpinLock with a real lock in the places where it matters. But I am going to leave this like that for now and revisit it later. The small regression (~< 5%) is acceptable ...