Open Sky9x opened 11 months ago
I'm not a fan of this. This saves a negligible amount of code but bunches up a whole lot of safety considerations. slice::from_raw_parts
has language-level memory safety invariants, and str::from_utf8_unchecked
has a single library-level safety invariant (that the slice contains valid UTF-8). In code, these should be justified separately. The example in https://github.com/rust-lang/libs-team/issues/167 was
unsafe {
let bytes = slice::from_raw_parts(ptr, len);
str::from_utf8_unchecked(bytes)
}
but it should have really been
// SAFETY: <explain why ptr and len make a valid slice>
let bytes = unsafe { slice::from_raw_parts(ptr, len) };
// SAFETY: <explain why this slice is UTF-8>
unsafe { str::from_utf8_unchecked(bytes) }
I don't see the value of combining these to
// SAFETY: <explain why ptr and len make a valid slice and why this slice is UTF-8>
unsafe { str::from_raw_parts(ptr, len) }
The proposed documentation also says
The pointed-to bytes must be valid UTF-8. If this might not be the case, use str::from_utf8(slice::from_raw_parts(ptr, len)), which will return an Err if the data isn’t valid UTF-8.
which kind of admits that there are two safety considerations that should be handled by two function calls. In the end, docs could just recommend str::from_utf8_unchecked(slice::from_raw_parts(ptr, len))
.
This API intends to be the borrowed equivalent the already-stable String::from_raw_parts
, and fill in the two missing methods in the table below, bringing the string API more in line with the slice API.
String | Slice | |
---|---|---|
& |
❌ str::from_raw_parts |
✅ slice::from_raw_parts |
&mut |
❌ str::from_raw_parts_mut |
✅ slice::from_raw_parts_mut |
Owned | ✅ String::from_raw_parts |
✅ Vec::from_raw_parts |
The documentation could be improved to properly express the safety concerns.
Also note that the ACP was accepted because of the existence of String::from_raw_parts
, which by the same argument could be written as String::from_utf8_unchecked(Vec::from_raw_parts(ptr, len, cap))
.
If you really don't want these functions to appear in your codebase, consider the clippy::disallowed_methods
lint:
# clippy.toml
disallowed-methods = [
"core::str::from_raw_parts",
"core::str::from_raw_parts_mut",
"alloc::string::String::from_raw_parts",
# ...
]
It looks like String::from_raw_parts
's safety documentation was at some point copy-pasted from Vec::from_raw_parts
's, but it has since diverged and is currently much less detailed. It for example misses the invariant "The allocated size in bytes must be no larger than isize::MAX
". This is exactly the issue I'm trying to point out, the documentation basically has to say
Safety
- Everything in
Vec::<u8>::from_raw_parts
's safety requirements- Everything in
String::from_utf8_unchecked
's safety requirements
at which point why not just make those two calls?
String::from_raw_parts
is one method with this issue, does this justify adding two more?
I'll copy my comment from the ACP here, as something potentially still worth discussing at stabilization time:
I'm not convinced that one more function call is "a fairly boilerplate filled task". That's especially true for unsafe code where the hard part is demonstrating that the safety conditions have been met.
Because of that, I think two calls is actually better when the two safety conditions are very different, like they are here.
(This is the same sentiment as https://github.com/rust-lang/rust/issues/119206#issuecomment-2175826671 above.)
Feature gate:
#![feature(str_from_raw_parts)]
This is a tracking issue for ACP#167, which adds
core::str::from_raw_parts[_mut]
.Public API
Steps / History
Unresolved Questions