rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.21k stars 12.81k forks source link

Tracking Issue for `str_from_raw_parts` #119206

Open Sky9x opened 11 months ago

Sky9x commented 11 months ago

Feature gate: #![feature(str_from_raw_parts)]

This is a tracking issue for ACP#167, which adds core::str::from_raw_parts[_mut].

Public API

// core::str

pub const unsafe fn from_raw_parts<'a>(ptr: *const u8, len: usize) -> &'a str

pub const unsafe fn from_raw_parts_mut<'a>(ptr: *mut u8, len: usize) -> &'a mut str

Steps / History

Unresolved Questions

robertbastian commented 5 months ago

I'm not a fan of this. This saves a negligible amount of code but bunches up a whole lot of safety considerations. slice::from_raw_parts has language-level memory safety invariants, and str::from_utf8_unchecked has a single library-level safety invariant (that the slice contains valid UTF-8). In code, these should be justified separately. The example in https://github.com/rust-lang/libs-team/issues/167 was

unsafe {
    let bytes = slice::from_raw_parts(ptr, len);
    str::from_utf8_unchecked(bytes)
}

but it should have really been

// SAFETY: <explain why ptr and len make a valid slice>
let bytes = unsafe { slice::from_raw_parts(ptr, len) };
// SAFETY: <explain why this slice is UTF-8>
unsafe { str::from_utf8_unchecked(bytes) }

I don't see the value of combining these to

// SAFETY: <explain why ptr and len make a valid slice and why this slice is UTF-8>
unsafe { str::from_raw_parts(ptr, len) }

The proposed documentation also says

The pointed-to bytes must be valid UTF-8. If this might not be the case, use str::from_utf8(slice::from_raw_parts(ptr, len)), which will return an Err if the data isn’t valid UTF-8.

which kind of admits that there are two safety considerations that should be handled by two function calls. In the end, docs could just recommend str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)).

Sky9x commented 5 months ago

This API intends to be the borrowed equivalent the already-stable String::from_raw_parts, and fill in the two missing methods in the table below, bringing the string API more in line with the slice API.

String Slice
& str::from_raw_parts slice::from_raw_parts
&mut str::from_raw_parts_mut slice::from_raw_parts_mut
Owned String::from_raw_parts Vec::from_raw_parts

The documentation could be improved to properly express the safety concerns.

Also note that the ACP was accepted because of the existence of String::from_raw_parts, which by the same argument could be written as String::from_utf8_unchecked(Vec::from_raw_parts(ptr, len, cap)).

If you really don't want these functions to appear in your codebase, consider the clippy::disallowed_methods lint:

# clippy.toml
disallowed-methods = [
    "core::str::from_raw_parts",
    "core::str::from_raw_parts_mut",
    "alloc::string::String::from_raw_parts",
    # ...
]
robertbastian commented 5 months ago

It looks like String::from_raw_parts's safety documentation was at some point copy-pasted from Vec::from_raw_parts's, but it has since diverged and is currently much less detailed. It for example misses the invariant "The allocated size in bytes must be no larger than isize::MAX". This is exactly the issue I'm trying to point out, the documentation basically has to say

Safety

  • Everything in Vec::<u8>::from_raw_parts's safety requirements
  • Everything in String::from_utf8_unchecked's safety requirements

at which point why not just make those two calls?

String::from_raw_parts is one method with this issue, does this justify adding two more?

scottmcm commented 2 months ago

I'll copy my comment from the ACP here, as something potentially still worth discussing at stabilization time:

I'm not convinced that one more function call is "a fairly boilerplate filled task". That's especially true for unsafe code where the hard part is demonstrating that the safety conditions have been met.

Because of that, I think two calls is actually better when the two safety conditions are very different, like they are here.

(This is the same sentiment as https://github.com/rust-lang/rust/issues/119206#issuecomment-2175826671 above.)