rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.6k stars 12.74k forks source link

The safety requirements of CStr::from_ptr have been made more strict than they used to be #100493

Open SoniEx2 opened 2 years ago

SoniEx2 commented 2 years ago

Location

CStr::from_ptr

Summary

95948 introduced incompatible changes to the CStr docs. Namely, the docs now require immutability for the whole lifetime of 'a, instead of just while it's borrowed. See also this discussion on urlo. This is relevant to our crate, ltptr, which provides bound lifetimes for raw pointers and uses those in conversions: https://docs.rs/ltptr/0.1.4/ltptr/trait.FromLtPtr.html

RalfJung commented 2 years ago

Your example seems to rely on not using the CStr any more before the lifetime ends, and exploiting that for other code. The intention is that this is indeed allowed under some conditions, but those conditions are subtle and the precise aliasing model that defines them has not been decided yet. (Stacked Borrows is the current proposal for such a model, but not its final form.)

For example, the following is considered UB:

unsafe fn foo<'a>(ptr: *const c_char, bogus: PhantomData<&'a ()>) {
  let first = unsafe { CStr::from_ptr::<'a>(ptr) }.to_str();
  bar(first);
}

fn bar(s: &str) {
  // we stop using `s` here
  unsafe { mutate_ffi_string(); }
}

This is UB because of a special treatment of function arguments, which means their aliasing rules must be upheld until the end of the function, no matter whether they are used or not. Inlining bar makes the UB go away.

So, until we have decided what exactly the rules and exceptions are for "ending borrows while their lifetime is still ongoing", the documentation cannot bless code that does that, since we might rule out such code later.

SoniEx2 commented 2 years ago

Sure, but we'd argue that clearly maintains the borrow - it's still alive in the parent function and whatnot, as first is Copy. (also to_str returns a Result but ok.)

But this is still a backwards compatibility hazard, and there's no reason FromLtPtr should be considered unsound because of it.

(also, it*)

RalfJung commented 2 years ago

Depending on unspecified details of the aliasing model is a backwards compatibility hazard. The documentation is written in a way such that if you follow it, you avoid backwards compatibility hazards.

https://github.com/rust-lang/rust/pull/95948 made the situation more clear to avoid steering people into relying on unspecified parts of the model. The previous documentation was considered insufficiently precise by users, which is why we changed it.

SoniEx2 commented 2 years ago

we mean this is sound:

let mut x = &mut whatever;
let y = &x.foo;
drop(y);
x.mutate();

and the rest of the rules basically follow from there.

it's more like "while 'a is borrowed" rather than "the whole 'a". the difference is one of those aligns with safe code and the other one is overly restrictive even for 100% unsafe-free code.

SoniEx2 commented 2 years ago

e.g.

use std::ffi::CStr;
use std::os::raw::c_char;

fn main() {
    fn read<'a>(ptr: *const c_char, bogus: &'a ()) -> String {
        unsafe { CStr::from_ptr::<'a>(ptr) }.to_str().unwrap().to_owned()
    }
    fn write<'a>(ptr: *mut c_char, bogus: &'a mut ()) {
        // whatever
    }
    fn foo<'a>(ptr: *mut c_char, bogus: &'a mut ()) {
        let first = read(ptr, &*bogus);
        write(ptr, bogus);
        let second = read(ptr, &*bogus);
    }
}
RalfJung commented 2 years ago

we mean this is sound:

The drop does nothing here.

Also in that code the lifetime of the &x.foo borrow does end when x stops being used, so this is perfectly fine by the CStr docs.

And FWIW it's not just CStr; slice::from_raw_parts has been documented like this for quite a while. CStr was just recently fixed to be consistent with everything else.

it's more like "while 'a is borrowed" rather than "the whole 'a". the difference is one of those aligns with safe code and the other one is overly restrictive even for 100% unsafe-free code.

"while 'a is borrowed" is not a defined term that we can use in documentation.

let first = read(ptr, &*bogus);

here you are reborrowing bogus, so the lifetime passed to read will actually be a short one.

SoniEx2 commented 2 years ago

well yes it needs to be reborrowed because it can't be coerced but reborrows are kinda weird. like, this is a reborrow:

fn foo<'a>(&'a self) -> &'a Foo {
  &self.inner
}

so it makes sense that whatever these unsafe functions do is also a reborrow.

but how do you talk about reborrows in docs. we should be able to talk about reborrows in docs. the docs for CStr and slice::from_raw_parts should reflect the true borrow semantics of safe rust, instead of being more restrictive than them. (unsafe is literally never more restrictive than safe rust - like, that's the whole point of unsafe.)

so: how do we make this better? because what we have today just isn't good enough.

(also, again, it*)

RalfJung commented 2 years ago

but how do you talk about reborrows in docs. we should be able to talk about reborrows in docs.

Indeed. But we'll need to figure out the precise rules first, and we don't want to document anything that might end up not working with those precise rules.

the docs for CStr and slice::from_raw_parts should reflect the true borrow semantics of safe rust

They do. The borrowing in safe Rust is entirely defined by lifetimes. Anything you could do in safe code is allowed by these docs.

The tricky part is defining the precise borrow semantics in unsafe Rust, and that is work that has started but is not finished yet.

(also, again, it*)

Sorry, but I have no idea what you mean by this comment.

SoniEx2 commented 2 years ago

we mean we use it instead of you (2nd person neopronouns).

so uh, would you be willing to consider ltptr as part of that work?

RalfJung commented 2 years ago

(I have only been using gender-neutral pronouns, and I am happy to stick to that.)

The example you gave here is something we hope to allow. I don't know which other patterns ltptr is intended to support, but here are some important factors:

If these are the only rules, the example is fine. The thing is, we don't know yet if that will be all the rules. There might be more. (Though to be honest, that seems unlikely.)

That said, the example does not use ltptr, so it is hard to tell what these rules mean for ltptr, or whether other uses of ltptr would be a problem even with these rules (that I am fairly sure we will need).

SoniEx2 commented 2 years ago

we ask you to rephrase these:

Your example seems to rely on not using the CStr any more before the lifetime ends, and exploiting that for other code.

here you are reborrowing bogus, so the lifetime passed to read will actually be a short one.

Sorry, but I have no idea what you mean by this comment.

The example you gave here is something we hope to allow.

to use "it" instead of "you", like so:

Its example seems to rely on not using the CStr any more before the lifetime ends, and exploiting that for other code.

here it is reborrowing bogus, so the lifetime passed to read will actually be a short one.

Sorry, but I have no idea what it means by this comment.

The example it gave here is something we hope to allow.


The same example, using ltptr, looks like so:

unsafe fn foo<'a>(ptr: ConstLtPtr<'a, c_char>) {
  let first = unsafe { CStr::from_lt_ptr(ptr) }.to_str().map(|x| x.to_owned());
  unsafe { mutate_ffi_string(); }
  let second = unsafe { CStr::from_lt_ptr(ptr) }.to_str().map(|x| x.to_owned());
}

Where from_lt_ptr is defined here: https://docs.rs/ltptr/0.1.4/ltptr/trait.FromLtPtr.html#impl-FromLtPtr%3Ci8%3E-for-CStr

And we believe this should be okay, even tho the lifetime is clearly 'a. (Tho, it is covariant we guess.)

(Note also that ltptr is still a work in progress, so it's missing things like casts and const<->mut conversions. We also can't have explicit unsafe deref like pointers can, sadly. Ideally it'd just be part of the language.)