Add `isAtomic` flag to ARC counters

elcritch commented 9 months ago

Abstract

Add an isAtomic flag to ARC counters implemented via a bitmask. Then increment / decrement operations for ARC could check the flag to determine whether an atomic operation is necessary. The stdlib would need to provide a method for user code to set this flag before sharing with another thread. Ideally this could be built into threading primitives as well.

Alternatively, this idea could be morphed to be a compile time flag. Users would use similar mechanisms to annotate what objects they're sharing. It would instead flag for the compiler that any object types involved should always use atomic counters. However, this would be trickier to use across shared library boundaries.

Motivation

Currently we have --mm:atomicArc which is enables sharing ref objects between threads. However, it can have a large performance impact on older machines or certain hotspots. Using an optional flag could balance both wanting atomic for thread safety with performance for non-threaded code.

I've been tossing about this possible design for a week or two, but I'm not convinced it'd be better than a pure atomicArc. However, I think it'd be worthwhile to check with others.

Additionally, another flag could possibly be added for ORC to "lock" an isAtomic object caught up during a cycle collection. I don't know the details of the ORC cycle detector aside from knowing that marking objects is common in such algorithms.

Description

There are downsides to using an isAtomic flag. The biggest is the potential overhead from branching in the incr / decr operations. It would require benchmarking to determine whether the potential for extra branching outweighs the cost of atomics.

Modern processors have large branch predictors, but this could permeate the code incurring an excessive increase in the usage of branch-predictor capacity. On older machines this might create a bigger performance impact than atomics. However a back-of-the-napkin comparison of the order-of-magnitude of doing a new L1 or L2 cache request to synchronize an atomic (often 100+ CPU cycles) would take much longer than recovering from a mis-predicated branch (usually less than 10's of CPU cycles).

Though perhaps usage of if (unlikely(isAtomic(x)) could mitigate the overhead at some cost for threads.

Another possible benefit is that code which using Isolate to move data between threads would continue not needing atomic operations.

Code Examples

Some rough pseudo-code to demonstrate:

var sharedObj: SomeRefObject
var chan: Chan[SomeRefObject]

proc doProcess1() {.thread1.} =
  let obj = SomeRefObject()
  setAtomicShared(sharedObj, obj) # could bypass needing `{.cast: gcsafe.}`
  # alternatively
  chan.send(markAtomic(obj))  

proc doProcess2() {.thread2.} =
  while sharedObj == nil:
    os.sleep(10)

  let obj = sharedObj
  echo "got shared object: ", obj
  ...

In the case that the isAtomic flag was a compile time property, the same code would just mark for the compiler to use atomic incr / decr mechanisms. This would be more flexible than marking individual objects as atomic like:

type SomeObject* {.atomic.} = ref object

Marking individual objects as atomic at declaration would prohibit users from marking objects from libraries as thread-safe objects. However, one could possible think of something like an alias type but that feels error prone to implement at the compiler level:

type SomeObject* {.atomic.} = LibraryObject

Backwards Compatibility

A compiler flag could be added to remove the isAtomic check and make counters either fully atomic or not-atomic at all. This could cover cases like embedded devices that might not need atomics at all, or to always enable atomics.

omentic commented 6 months ago

It's maybe worth looking at what Koka does here: see section 2.7.2 "Concurrent Execution" of its technical report. (TL;DR: objects are marked as "shared" and turned atomic when the compiler detects they are passed to an argument to spawn a thread). To the best of my understanding, Nim's memory management is very very similar, and Nim may be able to build off of similar foundations.

mratsim commented 6 months ago

Already mentioned this here https://forum.nim-lang.org/t/10177#67364, I strongly prefer this to be a compile-time property.

It can be either explicit with aref, {.atomic.} or atomic ref, just like Send in Rust, or it can be inferred with an explanation when there is an issue, just like {.gcsafe.} or {.noSideEffect.} or when a concept doesn't match.

To the best of my understanding, Nim's memory management is very very similar, and Nim may be able to build off of similar foundations.

To cite the paper:

Another recent language that uses reference counting is Nim. The reference counting method is scope-based and uses non-atomic operations (and objects cannot be shared across threads without extra precautions). Nim can be configured to use ORC reference counting which extends the basic ARC collector with a cycle collection. Nim has the acyclic annotation to identify data types that are (co)-inductive, as well as the (unsafe) cursor annotation for variables that should not be reference counted.

I do remember it being discussed on IRC.

Araq commented 6 months ago

It's maybe worth looking at what Koka does here: see section 2.7.2 "Concurrent Execution" of its technical report. (TL;DR: objects are marked as "shared" and turned atomic when the compiler detects they are passed to an argument to spawn a thread).

That doesn't really work as you need to mark the entire reachable subgraph as "atomic", turning an O(1) operation into an O(n) operation. Been there, done that. ;-)

elcritch commented 6 months ago

It can be either explicit with aref, {.atomic.} or atomic ref, just like Send in Rust, or it can be inferred with an explanation when there is an issue, just like {.gcsafe.} or {.noSideEffect.} or when a concept doesn't match.

@mratsim The biggest problem, IMHO, with most of the static approaches is the limitation of only being able to mark objects you control as shareable. It's the same frustrating limitation of not being able to implement traits including Send in Rust unless you own the type.

I do like the idea of something like concept matching as it could provide a dynamic / static way of doing this. Though the implicit binding of concepts is annoying. It'd be much nicer to have explicit way to say bind this concept to this type. Otherwise you get concept matching errors in random places in the code and folks often don't know why.

Actually just treating this isAtomic a compile time flag for specified types would be better than type annotations. It's easy to grep your configs or compiler configs as to what's atomic. Then code could have checks for when not isAtomic(T): {.error: "requires atomic".} checks.

Perhaps {.atomic.} would just force that flag to always be on while not being the only place it can be turned on? That'd actually be nice as some types are built from the ground up as shared types but users could make other types atomic as desired.

mratsim commented 6 months ago

The biggest problem, IMHO, with most of the static approaches is the limitation of only being able to mark objects you control as shareable. It's the same frustrating limitation of not being able to implement traits including Send in Rust unless you own the type.

What if you allow

# Package 1
type Foo = ref object

# Package 2
type AFoo = atomic Foo

The Rust limitation follows the principle of least surprise, probably Rust devs got burned with C++ "overrides". Here, deriving from a type to make it Sendable/Atomic is clean and would work like in Rust (I think). My main issue with Rust orphan rules has more to do with the inability to change the implementation with a better one.

elcritch commented 6 months ago

What if you allow

# Package 1
type Foo = ref object

# Package 2
type AFoo = atomic Foo

Yah that might work. Though if you pass an AFoo you can't pass it to a proc from the original library though right? Let's say you could pass AFoo directly it'd use the non-atomic ref count modifications for Foo right?

The Rust limitation follows the principle of least surprise, probably Rust devs got burned with C++ "overrides". ... My main issue with Rust orphan rules has more to do with the inability to change the implementation with a better one.

Their reasons do make sense, but it's just so limiting to not be able to print a type because it didn't derive Debug or something. :/ It's just the opposite extreme from C++.

Being able to replace an impl is just just another aspect of the lack of control for downstream users. Last I was aware you can't derive a newtype in Rust yet either. It's a philosophical difference for sure.

In Nim, type aliasing / newtypes could work for atomics. Though using that new type with the original proc's seems problematic. Especially with IC coming up. That's why I think a compile time flag approach could be useful.

Araq commented 6 months ago

The idea that you can retroactively claim a type with unknown refs inside to be threadsafe is pretty wild... I think we should improve the support for custom smart pointers instead, making them as convenient to reach out for as today's ref.

Background: Recent bug fixes made mm:atomicArc slower than when we did the benchmarking so a distinction between "unique" and "shared" is unavoidable. (That is on top of "unique"'s other benefits like the fact that by construction you cannot create cycles.)

elcritch commented 6 months ago

The idea that you can retroactively claim a type with unknown refs inside to be threadsafe is pretty wild...

That's a big issue with any "selective ways" to do atomic refs. I think even if you did the type AFoo = atomic Foo you'd have to recursively mark any included ref types as atomic too. Alternatively the compiler could error out on those types. That's a pretty big limitation but similar to Isolate currently.

I think we should improve the support for custom smart pointers instead, making them as convenient to reach out for as today's ref.

That would be nicer. Though types which contain other ref types also struggle with a smart pointer approach. That's why I sorta liked the runtime approach idea - the user just marks the objects they want to share as shared. In some ways it's similar to smart pointers, but could be applied recursively to ref's inside the object.

What would be really great would be "recursive smart pointers" or a smart pointer view types that enforces atomic semantics recursively:

type
  Bar = ref object
    val*: int
  Foo = ref object
    bar*: bar

  SharedFoo = shared Foo

let
  obj = SharedFoo(bar: Bar(val: 33))
  b: shared[Bar] = obj.bar

echo "foo:bar: ", b.val ## this would be atomic after the fact but also typed

That'd be cool. Not sure how possible it'd be to achieve. You'd need other view semantics too like not being able to capture a reference in a non-shared ref context.

Would that be feasible? That'd be flexible and user controlled while being typed and explicit.

Background: Recent bug fixes made mm:atomicArc slower than when we did the benchmarking so a distinction between "unique" and "shared" is unavoidable. (That is on top of "unique"'s other benefits like the fact that by construction you cannot create cycles.)

Good points. I think smart pointer wrapper types I described above would work with unique as well.

elcritch commented 6 months ago

What would be really great would be "recursive smart pointers" or a smart pointer view types that enforces atomic semantics recursively:

Ugghh, no that still wouldn't let you use the "shared" types with existing libraries either. It'd only be slightly better smart pointers. Though potentially still worth it?

So the options really seem to break down into:

full atomics like --mm:atomicArc possibly with full threaded ORC
specialty or custom atomic types, or just smart pointers
runtime marked atomics

Perhaps it'd be best to say hey if you want "easy" multi-threading just use option 1 and take the performance hit. If you don't mind making custom types or using sharedptr's then use option 2, which would also work transparently with option 1. That's not too different than users of ARC currently. I guess option 3 (this RFC) could be combined with option 2 somehow but that seems like no one would be fully happy.

mratsim commented 6 months ago

full atomics like --mm:atomicArc possibly with full threaded ORC

I find this useless except in a trivial case, case in point:

type Foo = ref object
  buffer: seq[byte]

--mm:atomic only solves sharing between threads, but reads and writes are still unsafe. I.e. you solve race conditions and not data races.

You just cannot say "when doing a multithreaded program just use --mm:atomicarc. People will need to:

either isolate each data structure to ensure single read/write at anytime, this is Communicating Sequential Process (CSP) examplified by go channels.
or use purpose-built threadsafe data structures.

A data structure which hasn't been designed from the get go will not be safe with {.atomic.} anyway, it has to use isolate and channels to ensure single ownership.

Any other approach is just bandaid.

Araq commented 6 months ago

--mm:atomicArc helps though if your data is immutable. Say you have one of these famous "Configuration" objects that every thread needs access to and maybe it has a field unmodelledStuff: JsonNode. Things magically work with mm:atomicArc. The proponents for atomicArc know that the access for mutable data still requires locks, it's this way in C# and Golang too. But without atomicArc you cannot wrap things in locks at all as the unsyncronized hidden RC write barriers kill you.

C++ did this one really well where both unique_ptr and shared_ptr are threadsafe building blocks and there being no equivalent of Rust's Rc (not synced) type.

nim-lang / RFCs