Open RReverser opened 4 years ago
From the looks of it, it seems that str
is already guaranteed to have the same repr as [u8]
. Actually, it's more than that. I suspect that &str
has a guaranteed representation as &[u8]
even though references/pointers to slices don't have a defined representation. This is used all over the core::str
to transmute string slices into byte slices. Here is one example, core::str::from_utf8
is another
To be clear, the standard library is allowed to depend on implementation details that aren't necessarily stable. That core
does something internally is not a guarantee that it'll work in future (or past) versions of Rust. The std is distributed with the compiler so can change as and when needed.
This is what we should have at the top of every single file instead of license headers.
To be clear, the standard library is allowed to depend on implementation details that aren't necessarily stable. That
core
does something internally is not a guarantee that it'll work in future (or past) versions of Rust. The std is distributed with the compiler so can change as and when needed.
Ofc, I am aware. Poor choice of words on my part. I just thought it's worth pointing out. Though in this particular case I don't know how they would do it any other way.
To be clear, the standard library is allowed to depend on implementation details that aren't necessarily stable. That
core
does something internally is not a guarantee that it'll work in future (or past) versions of Rust. The std is distributed with the compiler so can change as and when needed.
Yeah, that's why this issue, like the referenced issue about Box, is more about documenting and settings things in stone rather than changing behaviour.
Would it be illegal for String to ever add more fields, on top of the current pointer + capacity + length? The docs say this:
A String is made up of three components: a pointer to some bytes, a length, and a capacity.
But I think it's a bit too vague to answer my question.
If it's allowed to add more fields, then the only way you're guaranteed to access the internal Vecas_mut_vec
. Whereas, right now, you can transmute, pointer cast, and so on, on top of as_mut_vec
.
Noting that you can also pointer cast from a larger String to a Vec
Whereas, right now, you can transmute, pointer cast, and so on, on top of as_mut_vec.
No, you cannot. The docs don't say how these three components are encoded, at which type, and in which order.
This is just like for Vec
, which similarly may only be "deconstructed" via into_raw_parts
(or the equivalent stable APIs).
No, you cannot.
You can. It's unsafe as hell, so it's not guaranteed to work, but the bare minimum requirement for it to compile successfully is met: a String
is the same size as a Vec<u8>
. If a String
were to be repr(transparent)
, then it's guaranteed to work. Or, that's my thought process. I'm open to discussing just how it works, or doesn't, elsewhere.
But back to my original point, would it be illegal for a String to change its internals, at all? The current wording seems a bit vague. And every bit of logic I'm seeing involving the String
API says it's guaranteed to be a Vec<u8>
so long as the sizes are the same and the API doesn't have a breaking change.
It's unsafe as hell, so it's not guaranteed to work, but the bare minimum requirement for it to compile successfully is met: a
String
is the same size as aVec<u8>
.
Even unsafe code has rules on what you can or cannot do. If you're just transforming between unrelated types, you're basically begging the compiler to introduce UB. That's not what unsafe
is for - it's for writing code that is still UB-free, just not memory safe in ways that compiler can prove.
Your code is not even guaranteed to work on all existing platforms, because w/o repr(transparent)
the ABI representation of some struct A(B);
can be different than representation of the nested field B
- speaking as someone who ran into this in the past, although I don't remember which platform it was. So String
is not guaranteed to have same ABI as Vec<u8>
even though one is a wrapper for another.
would it be illegal for a String to change its internals, at all
This is what this issue is about. It's currently perfectly legal for String to change internal layout and break any such code that relies on such transmute
.
It would be safe to use transmute
only if it was specified as repr(transparent)
.
You can.
Typically, when people ask "can you do X?", there is an implied requirement that you can do it without causing UB. If you want to use a different definition of "can", you should say so.
In this case, transmuting String
to Vec
causes library UB, since the layout of String
is not guaranteed. Therefore, you cannot. It may happen to compile and not even cause language UB right now, but that's a happy accident, nothing more.
One such example is constructing Rc
from a Rc<Vec > where a library might want to check UTF-8 validity before construction and then Rc::from_raw + Rc::into_raw one type into another without actually copying the data.
I ran in the same kind of issue today. Is there any cons about this change that are blocking it?
Similarly to https://github.com/rust-lang/rust/issues/52976, I'd suggest that
str
andString
should have a guaranteed transparent representation of[u8]
andVec<u8>
correspondingly.They are a newtype wrappers of these types under the hood anyway, and I don't see any reasons why they would need to behave differently in the future, and making them different probably would become a breaking change anyway as it would break things like
String::from_raw_parts
constructed from raw parts of theVec
.While
String::from_raw_parts
already covers most common cases, there are still other cases where it would be desired to have guaranteed representation.One such example is constructing
Rc<String>
from aRc<Vec<u8>>
where a library might want to check UTF-8 validity before construction and thenRc::from_raw
+Rc::into_raw
one type into another without actually copying the data.Currently this is not possible to do safely because it's not guaranteed that the
*const Vec<u8>
returned from the firstinto_raw
is compatible with*const String
accepted by the second.While this particular case could be alleviated by adding a special method, there are lots of other containers where such compatibility between these types might be desirable, and I don't see any reasons why it couldn't be guaranteed at a lower level.