someguynamedjosh / ouroboros

Easy self-referential struct generation for Rust.
Apache License 2.0
527 stars 33 forks source link

Provide borrow_FIELD_mut as an unsafe method #65

Open SOF3 opened 2 years ago

SOF3 commented 2 years ago

Is it possible to provide borrow_FIELD_mut, provided sufficient documentation on its memory safety constraints? The current documentation does not explain under what circumstances would it be unsound to return the mutable reference directly.

peku33 commented 1 year ago

@joshua-maros Maybe you can provide a bit deeper explaination on this unsoundness? I imagine one of the problems is swapping the contents of mutable reference with something else, however maybe this can be done by returning Pin<&mut T>?

someguynamedjosh commented 1 year ago

Let's say I have a field data: &'this str. The point of a function returning &mut &'this str would be to replace the inner reference with some other reference. But, it is impossible to express the'this lifetime to Rust's borrow checker. The solution employed internally by Ouroboros is to replace 'this with 'static and provide safe wrappers that abstract away this detail. If this were to be exposed, it would be easy to create a reference to dropped data:

#[self_referencing]
struct SelfRef {
    base: String,
    data: &'this str,
}
fn invalid_static_ref() -> &'static str {
    let self_ref = SelfRef::new("Drop Me".to_owned(), |base_ref| base_ref);
    *self_ref.borrow_data_mut()
}

So, maybe we should employ the trick that makes borrow_FIELD work, replacing the 'this lifetime with the lifetime of the self reference used in the method call. But since mutable references are invariant, this would also be unsound:

fn print_invalid_ref() {
    let self_ref = SelfRef::new("Unused".to_owned(), |base_ref| base_ref);
    let new_str = "This will be dropped too early".to_owned();
    'temp: {
        let data: &mut &'temp str = self_ref.borrow_data_mut();
        // Valid, new_str outlives 'temp.
        *data = &new_str;
    }
    drop(new_str);
    println!("{:?}", self_ref.get_data());
}

In general, the property of "invariance" that mutable references have dictates that any lifetimes that appear inside them must be exactly unchanged from their original values. Since Ouroboros is designed to fake a lifetime that Rust cannot express, it is impossible to return a mutable reference referring to types with the appropriate lifetimes. Given this limitation, it is my understanding that the with_mut and with_FIELD_mut methods cover all safe use cases.

morrisonlevi commented 12 months ago

Consider code like this:

struct BorrowedStringTable<'b> {
    /// The arena to store the characters in.
    arena: &'b bumpalo::Bump,

    /// Used to have efficient lookup by index, and to provide an
    /// [Iterator] over the strings.
    vec: Vec<&'b str>,

    /// Used to have efficient lookup by [&str].
    map: HashMap<&'b str, size_t>,
}

#[self_referencing]
struct StringTableCell {
    owner: bumpalo::Bump,

    #[borrows(owner)]
    #[covariant]
    dependent: BorrowedStringTable<'this>,
}

I want to write a clear() method, which basically calls dependent.map.clear(), dependent.vec.clear(), and owner.reset(). I know it's unsafe, but it's sound, at least I think it is. I would need a &mut Bump, and there's no API to do it.

I think the issue is basically asking... can we make something available for these cases, with some documentation about what can or cannot be done?

someguynamedjosh commented 12 months ago

There's a much better way to do what you're asking, assuming you've defined a user-friendly constructor called new_empty:

impl StringTableCell {
    pub fn clear(&mut self) {
        *self = Self::new_empty();
    }
}

It also safeguards against adding future things and forgetting to clear them in the clear method.

morrisonlevi commented 12 months ago

This is not viable for me*. I need to do more work than was posted here after the resets have occurred, so I can't use an empty arena. This means at least one new allocation which definitely goes against the purpose of why I'm adding a clear method in the first place.

* I mean, I could make it work. Coding is all about trade-offs. I'd just like to eat my cake and still have it too.

someguynamedjosh commented 12 months ago

If you need to keep the arena, you could do it with something like this (which is effectively what would be required in vanilla Rust to do something similar):

impl StringTableCell {
    pub fn cleared(self) -> Self {
        let mut arena = self.into_heads().owner;
        arena.clear();
        Self::new(arena, BorrowedStringTable::new)
    }
}

If this is still not acceptable, you would need to use unsafe code, even in the equivalent vanilla Rust scenario. If you really need what you're saying, I'm worried how the surrounding code is structured that that one allocation is a significant performance hit. It feels like being concerned about bounds checking - the vast majority of the time, it's not actually a serious problem, and trying to solve it by diving into unsafe code can end up putting your application in a worse spot than just accepting the tiny performance hit. And if the performance hit isn't tiny, there's probably something you can do to fix that.

My main concern with making a borrow_FIELD_mut method is that it lets you mess up the internals of the struct in any way that regular unsafe code can. It's effectively a switch that says "even though I'm using this library, I'd like to erase all guarantees the library provides me." You would have to keep track of as many guarantees as if you wrote your own unsafe wrapper from scratch. Providing a method connotates that there's something the library is taking care of for you, when in reality it's providing you total control over the internals of the struct. If you need total control, there's not really a point to using an existing library.

SOF3 commented 12 months ago

My original motivation for borrow_FIELD_mut was to implement something similar to lock_api::ArcMutexGuard:

#[self_referencing]
struct MappedArcMutexGuard<T, U> {
    owned: Arc<Mutex<T>>,
    #[borrows(owned)]
    guard: MappedMutexGuard<'this, RawLock, U>,
}

In this case, guard is only meaningful to use when it is borrowed mutably.

It seems the only soundness issue mentioned above is when I do something like this:

let guard = MappedArcMutexGuard::new(
    arc.clone(),
    |owned| MutexGuard::map(owned.lock(), |t| U::from(t)),
);
'outer: {
    let new = some_other_mapped_mutex_guard();
    'inner: {
        let guard_inner = guard.borrow_guard_mut();
        *guard_inner = new;
    }
}
guard.borrow_guard() // dangling reference to `new`

But trying to swap the referenced object here seems really uncommon.

someguynamedjosh commented 12 months ago

Is with_FIELD_mut not acceptable for your use case?

SOF3 commented 12 months ago

The issue was created one year ago and I have refactored my code to avoid using lock_arc since then, so I can't remember the details of my original use case, but I suppose there were indeed difficulties with using the with style.

gyscos commented 9 months ago

ArcMutexGuard implements DerefMut, which is, if I understand correctly, currently impossible to replicate with the current API (safe or unsafe). Having an unsafe (but provably sound depending on the types involved) way to reach API-parity could be beneficial.