rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
659 stars 57 forks source link

`Send` and `Sync` #293

Open Skepfyr opened 3 years ago

Skepfyr commented 3 years ago

See prior discussion on IRLO here.

Send and Sync are defined in the 'nomicon as:

I'm reasonably happy that the definition for Sync is clear and unambiguous, however Send feels less obvious.

I think the definition of Send we had in that thread is:
A type is Send if it is safe to transfer ownership or give an exclusive reference to any other thread.

I ended up changing that sentence less than I expected because it's technically correct but not very useful, I think explaining what unsafe code is allowed to do would help clarify.

Some examples of things that this definition allows unsafe code to do with non-Send types:

thomcc commented 3 years ago

Yeah, I've often thought Send was a bit over-broad — I think !Send types are in a couple categories:

  1. Types that really genuinely cannot be sent to another thread, ever ever. This is the rarest kind, IME, and is perfectly reflected by !Send.

  2. Types which need to be destroyed on the thread where they were created (more generally, have some methods that must be executed on a specific thread, but the common case here is destruction, e.g. in the case of pthread mutex guards).

  3. Types which must not outlive the thread upon which they were created, usually because they hold a pointer to some thread local data (which will be freed when the creating thread is destroyed).

Skepfyr commented 3 years ago
  1. Types that really genuinely cannot be sent to another thread, ever ever. This is the rarest kind, IME, and is perfectly reflected by !Send.

Are there actually any types that fall in this category? I've been asking around for a while about whether these types exist as they'd make my library unsound but so far no-one has come up with an example.

thomcc commented 3 years ago

I think it would have to include user-created invariants. A common case might be "This is a 'token' type which enforces an extremely thread-unsafe C library is only used from one thread" (because all operations take token as an argument, etc).

What does your library do? I kind of suspect it is likely unsound.

Skepfyr commented 3 years ago

It's here: https://crates.io/crates/diplomatic-bag

It wraps up !Send types in a Send wrapper, all operations on that wrapper are run on a shared thread, so from safe code it's never possible to access the inner type except for on that shared thread.

thomcc commented 3 years ago

Ah, I withdraw my comment, that seems sound, but I don't understand how types in case 1 would make the library unsound

Skepfyr commented 3 years ago

The point I'm claiming is that all types can be sent to any thread as long as safe code can't access it (and unsafe code doesn't break whatever invariant makes that type !Send). That library keeps things useful by having a way to send them back to the thread where they were created.

Lokathor commented 3 years ago

That would be unsound in the presence of some GUI things :/

Skepfyr commented 3 years ago

Oooh, exciting, can you elaborate?

Lokathor commented 3 years ago

on mac the gui must only be altered from the main thread. specifically the main thread is magical, you can't run the gui in some other side thread.

Though, reading the diplomatic bag docs more closely i think it wouldn't make things more UB than normal.

Diggsey commented 3 years ago

@Lokathor "altering the gui" would constitute calling a method on the !Send type, and would therefore fall under category (2) not category (1).

@thomcc I think it's causing some confusion because you focused on destructors in (2) whereas it is very common for all methods to be unsound if called from another thread (eg. Rc<T> cannot be cloned).

By your original definition (1) is completely impossible in all cases but one that I can think of. The reason for this is that "moving" a value is always a simple memory copy in Rust, and so can have no side effects. All other things you could do with a !Send value would involve method calls (even if they are implicit like with Drop).

The one case I can think of is if you implemented a Boehm-style conservative garbage collector which uses separate heaps for each thread. In that case, the GC could expect to scan a thread's stack to find all roots into that thread's heap. If the value had been moved to another thread's stack, then the GC would not find it, and could believe its target memory to be unreachable.

Lokathor commented 3 years ago

I agree that type1 doesn't exist. As you say, raw bytes can be sent anywhere, it's code that can't be executed anywhere.

thomcc commented 3 years ago

Right, the mere existence of the type being elsewhere isn't a problem (admittedly, the CGC case is an interesting counterexample... but it's probably(?) out of scope of consideration).

I more meant that you can't generally know what code will do in response to witnessing a !Send type on a thread other than its creation thread — it could easily be unsound.

RalfJung commented 3 years ago

I'm reasonably happy that the definition for Sync is clear and unambiguous, however Send feels less obvious.

That's funny -- their definition, when captured formally, is almost exactly the same, and in particular there is perfect duality between them. We should not change one of them without also changing the other, to ensure that this symmetry remains intact. :)

I have attempted to give an informal description of our formal model in the IRLO thread.

There also was a comment about validity invariants somewhere -- I'd say that the "validity invariant" that corresponds loosely to these traits is "data races are UB". But other than that, Send and Sync are library-only concepts.

@Skepfyr

Are there actually any types that fall in this category? I've been asking around for a while about whether these types exist as they'd make my library unsound but so far no-one has come up with an example.

Rc is a perfect example of a !Send type. It's safety invariant only holds in one particular thread (namely the thread that is allowed to non-atomically read and write the reference count).

The point I'm claiming is that all types can be sent to any thread as long as safe code can't access it (and unsafe code doesn't break whatever invariant makes that type !Send). That library keeps things useful by having a way to send them back to the thread where they were created.

I mean, bytes are untyped, so you can do whatever you want with these bytes -- including sending them to another thread. It's not the data that cares about the thread, it's the safe typed methods that work on the data -- for example, all methods that take an Rc assume that they are called in the thread that the Rc is "tied to". That is what !Send means.

@thomcc

the CGC case is an interesting counterexample

I could not figure out what "CGC" refers to here, could you please add a link to that example?

Skepfyr commented 3 years ago

@RalfJung

That's funny -- their definition, when captured formally, is almost exactly the same, and in particular there is perfect duality between them. We should not change one of them without also changing the other, to ensure that this symmetry remains intact. :)

The definition "T is Sync iff &T is Send" is probably why I find the Sync definition fine but the send one not. Sync is currently defined in terms of Send, so changing Send will automatically change Sync.

I mean, bytes are untyped, so you can do whatever you want with these bytes -- including sending them to another thread. It's not the data that cares about the thread, it's the safe typed methods that work on the data -- for example, all methods that take an Rc assume that they are called in the thread that the Rc is "tied to". That is what !Send means.

That's essentially what I've been driving towards, and clearly being a bit unclear on. Although appears to have a counter example below.

I could not figure out what "CGC" refers to here, could you please add a link to that example?

This is referring to @Diggsey's counter example above:

The one case I can think of is if you implemented a Boehm-style conservative garbage collector which uses separate heaps for each thread. In that case, the GC could expect to scan a thread's stack to find all roots into that thread's heap. If the value had been moved to another thread's stack, then the GC would not find it, and could believe its target memory to be unreachable.

RalfJung commented 3 years ago

A GC with that assumption would be wrong also in every concurrent C program that ever moves any pointer from one thread to another -- so I think that's just a false assumption for the GC to make.

thomcc commented 3 years ago

Yes, in general conservative garbage collectors (CGCs, sorry for the jargon) are pretty dubious at best, especially if they knowingly only scan part of the heap. Also, they pretty much function by exploiting UB, even in the best of cases.

I don't think they're particularly common anymore either, although certainly Boehm's was, for a time.

Diggsey commented 3 years ago

@RalfJung the implication is that the GC would be using its own smart-pointer type which is !Send, so that can't happen.

bjorn3 commented 3 years ago

The following is sound, but defeats a conservative GC even if the smart-pointer type is !Send:

let ptr = Box::into_raw(Box::new(gc_value));
let val = ptr as usize - 1000;
gc();
let ptr = (val + 1000) as *mut Gc;
let gc_value = unsafe { *Box::from_raw(ptr) };

If the compiler decides to clear gc_value and ptr before the gc() call, this will miss the pointer. It only sees a pointer to 1000 bytes before the heap allocation containing gc_value. Note that the compiler doesn't have to emit a clear instruction to clear gc_value or ptr. It can also just overwrite it with another value. For example to spill caller saved registers to the stack during the gc() call.