[idea] CAS for reference counts

"hotspot bias locks" text presents an explanation why CAS is typically fast (doesn't require bus activity if this CPU has the value in a cache line in M state).

Perhaps CAS would prove faster than atomic inc/dec for cases without contention?

The fib test case may be particularly nasty here (at least 4 refcount ops on fib for 34 bytecode instructions and references on the stack), and I wonder just how well it represents real-world programs?

On one hand, multithreaded CPU-bound programs are very likely to execute exact same code in most threads;

On the other, useful programs have notably more code, and the ratio of shared (fib) to private references (locals, ephemeral objects) is going to be much lower.

larryhastings / gilectomy

[idea] CAS for reference counts #42