Open juj opened 7 months ago
Good question. My thinking here was that unlike an intrinsic in C++, the function call overhead in JS is actually quite large. In the unoptimized execution tier, calling Atomics.microwait()
in an exponential backoff loop as you would in C++ would result in much longer-than-intended wait times. In optimized execution tiers, where we can inline known built-ins calls like Atomics.microwait()
, this cost can go away. Basically, I think encouraging the following loop means a big timing discprenacy between unoptimized and optimized tiers:
do {
if (TryLock()) return true;
for (let yields = 0; yields < backoff; yields++) {
Atomics.microwait();
tries++;
}
backoff = Math.min(kMaxBackoff, backoff << 1);
} while (tries < kSpinCount);
This call overhead discrepancy can be worked around if the API's affordance is for not writing an exponential backoff loop manually with multiple calls to Atomics.microwait()
inside the loop, but instead to pass an iteration number hint to the API. So, this loop instead:
do {
if (TryLock()) return true;
Atomics.microwait(tries++);
} while (tries < kMicrowaitCount);
The proposed API reads
Atomics.microwait(iterationNumber)
, and is intended to be used something like this:The question here is: why should the provided API incorporate this type of wait length integer mechanism into it?
The wait durations are left vague, so developers calling
Atomics.microwait()
will have a black box understanding of what actually happens.The native instructions do not take any parameters, i.e. both Intel _mm_pause() and ARM YIELD
Would it be better to directly expose the identical shape of the wait instructions that exists on native platforms? Multithreaded synchronization performance is most often subject to tuning. If developers have a black box understanding of exact details of what happens, they will have a more difficult time to tune their code (needing more benchmarks to prime their mental model to understand), and browsers will be locked to not being able to change their implementation of the details, since they do not have the same project implementations available that devs were seeing to ensure their changes won't be detrimental.
It feels to me that KISS would apply here, so exposing a direct
Atomics.microWait()
without arguments, that would directly map to a _mm_pause/yield would be simpler and more straightforward?If the "hand-hold developers to follow Intel's documentation" is the rationale, I think it would be better to carry that guidance through in documentation examples that would highlight the best practices in the Wasm spec?
Also, would it be possible to codify in the spec that
Atomics.microWait(0)
shall mean the same as if a single _mm_pause/yield instruction call took place? (maybe using a wording like the smallest possible atomic yield?)