rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.55k stars 12.74k forks source link

Tracking Issue for SyncUnsafeCell #95439

Open m-ou-se opened 2 years ago

m-ou-se commented 2 years ago

Feature gate: #![feature(sync_unsafe_cell)]

This is a tracking issue for SyncUnsafeCell.

Public API

// core::cell

#[repr(transparent)]
pub struct SyncUnsafeCell<T: ?Sized> {
    value: UnsafeCell<T>,
}

unsafe impl<T: ?Sized + Sync> Sync for SyncUnsafeCell<T> {}

// Identical interface as UnsafeCell:

impl<T> SyncUnsafeCell<T> {
    pub const fn new(value: T) -> Self;
    pub const fn into_inner(self) -> T;
}

impl<T: ?Sized> SyncUnsafeCell<T> {
    pub const fn get(&self) -> *mut T;
    pub const fn get_mut(&mut self) -> &mut T;
    pub const fn raw_get(this: *const Self) -> *mut T;
}

impl<T> From<T> for SyncUnsafeCell<T>;
impl<T: Default> Default for SyncUnsafeCell<T>;
impl<T: ?Sized> Debug for SyncUnsafeCell<T>;

Steps / History

Unresolved Questions

bstrie commented 2 years ago

The arguments in favor of having such a type are contained in the (very long) thread at https://github.com/rust-lang/rust/issues/53639 .

TL;DR: static mut makes it way too easy to get a mutable reference to a static, which is wildly unsound in the face of concurrency, and having users write their own versions of this type is very error prone (see https://github.com/rust-lang/rust/issues/53639#issuecomment-790091647 for one example). This is an unsafe pattern that should be encapsulated by the stdlib, and might someday lead to the ability to deprecate static mut.

josephlr commented 2 years ago

This would be a great addition to the standard library. Most OS-level crates I've worked on eventually have to use static mut or implement an abstraction like this if they need to have static mutable memory (such as a bootstrap stack or GDT). One of the downsides of static mut is that doing pointer math with addresses to static mut variables has to be marked unsafe. Even addr_of!(STATIC_VAR) needs an unsafe annotation.

This abstractions makes it so raw pointer math would be safe, while use of the pointee to would be unsafe.

A concrete example is from @phil-opp's blog "Writing an OS in Rust" where static mut is used for the TSS Double Fault stack.

Pointerbender commented 2 years ago

Unresolved Question: Should it be Sync only if T is Sync, or should it unconditionally be Sync?

I think the former might be the best, if it's unconditionally Sync then it would be possible to wrap !Send types (e.g. Rc or thread local storage values) in it and make it safe to share these between threads. At the very least it should be Sync if T is Send.

Raekye commented 2 years ago
  • Should it be Sync only if T is Sync, or should it unconditionally be Sync?

Conceptually, I think T should be required to be Sync. And while we're at it, shouldn't it also require T: Send at the very least, as Pointerbender mentioned?

In practice, I commonly deal with types containing raw pointers (coming from bindgen), which are (very un-ergonomically) !Send and !Sync. So unfortunately, I'll have to continue using my own implementation which is unconditionally Sync, because of raw pointers being unconditionally !Send and !Sync.

traviscross commented 2 years ago

There has been previous discussion that if a new UnsafeCell type were to be added, that it should be both Sync and Copy. Does that have any bearing on this issue?

In particular, it seems as though stabilizing this without a Copy impl may complicate or preclude adding one later for the reasons that have made #55207 and #25053 challenging [^1]. If we were to want a version of UnsafeCell that implements Copy, is it better to have both SyncUnsafeCell and CopyUnsafeCell or better to have one RawUnsafeCell (a.k.a. (farcically) VeryUnsafeCell, SuperUnsafeCell, etc.) that implements both and from which all others can be derived?

As background:

@SimonSapin proposed this primitive here: https://github.com/rust-lang/rust/pull/55207#issuecomment-435467890 https://github.com/rust-lang/rust/issues/53639#issuecomment-435468228

@aturon discussed adding a separate Copy version of UnsafeCell here: https://github.com/rust-lang/rust/issues/25053#issuecomment-274262179

@RalfJung analyzed the situation in some detail: https://github.com/rust-lang/rust/issues/25053#issuecomment-317099890 https://github.com/rust-lang/rust/pull/55207#issuecomment-435570632

cc #55207 #25053 @jamesmunns

[^1]: That is, UnsafeCell could have been Copy originally, but adding the impl now is hard, just as UnsafeCell could have been Sync originally, but at this point we're instead adding a new struct.

jamesmunns commented 2 years ago

In general, in my uses of UnsafeCell-made-sync, which are typically data structures that I am managing thread safety manually, e.g. for thread safe queues or allocators or such, I very much would not want it to be Copy. I can hunt down some concrete examples if it would be helpful, but I have a lot of UnsafeCell<MaybeUninit<T>>s in my code for reasons.

I'm not certain if the proposal was something like:

impl<T: Copy> Copy for NewUnsafeCell<T> {}

Which might be more palatable, but it would probably make me nervous that I might ACCIDENTALLY make my types copy, when I really don't want them to be, for example if I happened to use some T: Copy data in my structures.

I think a lot of mine (and maybe others?) desire for a Copy version of UnsafeCell has been abated by the fact that UnsafeCell::new() is now const, and Mara's "Const Single" pattern works, e.g.

const SINGLE: UnsafeCell<NonCopyType> = UnsafeCell::new(NonCopyType::const_new());
let arr: [UnsafeCell<NonCopyType>; 64] = [SINGLE; 64];

There might be others who have that desire! But you pinged me, and I can't say I've ever wanted it, outside of initializers, where the "Const Single" pattern works reasonably well, even if it isn't completely widely known.

Edit:

As a very quick example, bbqueue uses an UnsafeCell<MaybeUninit<[u8; N]>> for a backing storage.

c.f.: https://github.com/jamesmunns/bbqueue/blob/f73423c0b1c5fe04723e5b5bd57d1a44ff106473/core/src/bbbuffer.rs#L22-L54

I feel like it would be a footgun to make this structure Copy, even if MaybeUninit<[u8; N]> could be, and I think it would, given that [u8; N] is Copy.

SimonSapin commented 2 years ago

It looks like I did write a few years ago about a possible SomethingUnsafeCell that would both Sync and Copy, but now I agree with @jamesmunns that it’s probably not a good idea.

SyncUnsafeCell exists mostly to use in static to replace static mut where unintentional accesses can cause UB, whereas always manipulating raw pointers makes it easier to see where they are dereferenced. Making it also Copy would reintroduce that risk of unintentional (and not necessarily properly synchronized) access to a shared static.

jamesmunns commented 2 years ago

Oh. I just now realized I opened https://github.com/rust-lang/rust/pull/55207 in 2018, which proposed to make UnsafeCell Copy/Clone (oh how time flies!)

As I mentioned downthread: https://github.com/rust-lang/rust/pull/55207#issuecomment-462147250, I definitely agree that my original goal was mainly around const/array initialization, which is solved by (better IMO) by https://github.com/rust-lang/rust/issues/49147.

RalfJung commented 2 years ago

Which might be more palatable, but it would probably make me nervous that I might ACCIDENTALLY make my types copy, when I really don't want them to be, for example if I happened to use some T: Copy data in my structures.

That would not happen. You still have to add derive(Copy) to your type, which of course you won't.

I really don't understand the problem here.

jamesmunns commented 2 years ago

In the example I gave, MaybeUninit<[u8; 64]> is Copy because [u8; 64] is Copy, and MaybeUninit<T> is Copy when T is.

Therefore without adding derive(Copy), SuperUnsafeCell<MaybeUninit<[u8; 64]>> is now Copy, which was not previously the case for UnsafeCell<MaybeUninit<[u8; 64]>>, which I might never want to be copy, because it serves as an inner-mutability data store, and any unintentional copies would be a logic error.

RalfJung commented 2 years ago

I assumed you were talking about a type you defined yourself.

Interior mutability data stores can be perfectly fine with Copy (making Cell: Copy would be sound), so it seems like a bad idea to disallow this for everyone and make it outright impossible to have an interior mutable type that is Copy. That's just a fundamental expressiveness gap that Rust currently has, and not for any good reason IMO.

jamesmunns commented 2 years ago

That's fair. However some of my worst bugs when working with unsafe are when I accidentally create a copy of some data that happens to be Copy (e.g. accidentally copying to the stack when trying to reborrow from a pointer), then returning a reference to a local rather than the non-local pointer. Since UnsafeCell is likely to be used in pointer form (e.g. through the .get() interface, using it with any T: Copy data makes me wary.

This is probably a niche issue (and almost certainly only a personal experience report), but that's my $0.02. If there are other compelling use cases, I'll just have to be more careful in the future, or stick with the old UnsafeCell with a manual Sync impl.

m-ou-se commented 2 years ago

Interior mutability data stores can be perfectly fine with Copy (making Cell: Copy would be sound), so it seems like a bad idea to disallow this for everyone and make it outright impossible to have an interior mutable type that is Copy. That's just a fundamental expressiveness gap that Rust currently has, and not for any good reason IMO.

I think this is a really important point. I think this is case where Rust accidentally mixes 'unsafe' with 'just a bad idea'. The best example of that is how mem::forget is usually a "bad idea", but not unsafe. From that perspective, UnsafeCell does it wrong: it is not Sync, and requires you to write unsafe impl Sync on your type containing it if you want to share it, even though a Sync UnsafeCell is perfectly sound. Same for Copy for Cell. Same for Send and Sync for raw pointers.

At this point we can't just make these types Sync (or Copy, etc.) without breaking existing code, but it's interesting to think about how to represent a concept like "this type is technically Sync (or Copy), but it's usually not what you want" in the language.

m-ou-se commented 2 years ago

It reminds me also of some arguments for whether to implement Copy for Range or not. It's not unsound, but might lead to mistakes. Whether that means it should or shouldn't be done is debatable.

notgull commented 2 years ago

I support this, although I think it should be explicitly marked as a "static mut killer" in the documentation.

thomcc commented 2 years ago

It's not really a static mut killer as is, since it doesn't relax the Send requirement. When I've wanted to use it this has been an issue (since we have a few types which are !Send & !Sync just as lints).

I do see why this limitation exists, though.

ibraheemdev commented 2 years ago

Would it be better to have a generic AlwaysSync<T> wrapper? Constructing it in general would be unsafe but it could have a safe From<UnsafeCell<T>> implementation.

tgross35 commented 1 year ago

In practice, I commonly deal with types containing raw pointers (coming from bindgen), which are (very un-ergonomically) !Send and !Sync. So unfortunately, I'll have to continue using my own implementation which is unconditionally Sync, because of raw pointers being unconditionally !Send and !Sync.

Would it be better to have a generic AlwaysSync wrapper? Constructing it in general would be unsafe but it could have a safe From<UnsafeCell> implementation.

I would appreciate having something like this. Creating static structs for things like plugins is really quite painful, especially when they contain raw pointers, because you have to wrap everything in a sync-forcing type (e.g. from one of my proc macros, my UnsafeSyncCell is always Sync). This type is just an UnsafeCell transparant wrapper where new is also unsafe.

I don't think an AlwaysSyncCell needs to be the same as SyncUnsafeCell since they have somewhat different goals (SyncUnsafeCell being more useful for existing Rust types, AlwaysSyncCell being useful more for statics/synchronization with the C interface). But it would be good to keep in mind for naming/design decisions that both concepts do need to exist, be it in std itself or just elsewhere in the ecosystem.

Question just since I haven't seen it anywhere - is unsafe impl<T: ?Sized + Sync> Sync for UnsafeCell <T> {} not possible, or would it somehow be breaking? Or is not having UnsafeCell ever be sync just a form of lint against unintended use?

SimonSapin commented 1 year ago

Some existing unsafe code might rely on the auto-trait behavior of UnsafeCell<T>: !Sync (even for T: Sync) to make their own wrappers !Sync.

For example, RefCell doesn’t and has an explicit impl<T: ?Sized> !Sync for RefCell<T> {} but only to provide better error messages. It would still be sound without it in current Rust.

thecaralice commented 10 months ago

Another option which I don't see mentioned here is making SyncUnsafeCell always Sync but introducing a warning-level lint for using SyncUnsafeCell<T> with T: ?Sync

WorldSEnder commented 3 months ago

That's fair. However some of my worst bugs when working with unsafe are when I accidentally create a copy of some data that happens to be Copy (e.g. accidentally copying to the stack when trying to reborrow from a pointer), then returning a reference to a local rather than the non-local pointer.

While I understand the concerns, this is mostly a matter of discipline and letting the compiler figuring out the necessary reborrowing. Since an *UnsafeCell is not Deref, you wouldn't want to *my_cell except to explicitly use the copy impl if it would exist. If you want to get to the data, you would go through my_cell.get() or get_raw().

To show a use of having a Copyable cell, I'm currently writing a datastructure Foo<T> that for ergonomical reasons I want to be Copy if T: Copy since the user should pretend as though it's just a plain old T, but actually keeps track of some "operation"s that are done to that T behind the scenes. Problematically, I wouldn't be able to observe the copy of these Foo<T>, so to fix that, I would include a counter in it that increases with every "operation" and when an implicit copy is made, I can see the old counter value in the backend to rewind to where - logically - a copy should have been performed.

#[derive(Clone, Copy)]
struct Foo<T> { ... }
impl<T> Foo<T> {
    fn operate(&self); // note *NOT* &mut self. All operations are sequenced in the backend.
}

So I would want to increase the counter in Foo::operate but that should not require a &mut self but a &self. Having that counter in a Cell<usize> would be nice, but alas Cell is not Copy for basically historical reasons. I would really want to have an internally-mutable-yet-copy type although I concede that I do not care as much about Sync issues, since you can fix those by a manual unsafe impl.


EDIT: Tl;dr Thinking and reading more about this, I believe that not having a Copy impl on the *UnsafeCell makes a lot of sense, but we need to have a way to have some consuming type opt out of the structural check of Copy and let it implement Copy despite containing an *UnsafeCell. That would open up the path towards impl<T: Copy> Copy for Cell<T> (which is safe and okay as far as I can gather) and solve my issue anway. Sorry, since it seems entirely unrelated to this issue now :)