"hotspot bias locks" text presents an explanation why CAS is typically fast (doesn't require bus activity if this CPU has the value in a cache line in M state).
Perhaps CAS would prove faster than atomic inc/dec for cases without contention?
The fib test case may be particularly nasty here (at least 4 refcount ops on fib for 34 bytecode instructions and references on the stack), and I wonder just how well it represents real-world programs?
On one hand, multithreaded CPU-bound programs are very likely to execute exact same code in most threads;
On the other, useful programs have notably more code, and the ratio of shared (fib) to private references (locals, ephemeral objects) is going to be much lower.
"hotspot bias locks" text presents an explanation why CAS is typically fast (doesn't require bus activity if this CPU has the value in a cache line in M state).
Perhaps CAS would prove faster than atomic inc/dec for cases without contention?
The
fib
test case may be particularly nasty here (at least 4 refcount ops onfib
for 34 bytecode instructions and references on the stack), and I wonder just how well it represents real-world programs?On one hand, multithreaded CPU-bound programs are very likely to execute exact same code in most threads;
On the other, useful programs have notably more code, and the ratio of shared (
fib
) to private references (locals, ephemeral objects) is going to be much lower.