Open nrc opened 9 years ago
I'm confused as to what this is about. isn't https://github.com/rust-lang/rust/issues/18469 the tracking issue?
Yes, this is a specific part of 18469. Seems useful to have it by itself since it stands alone (goes for the other issues too).
Is there anything that could be done to push this towards stabilization? I have some APIs in mind that would benefit significantly from the ability to bound on the Unsize
trait.
I was idly wondering about this the other date with a type such as:
struct Foo {
flag: bool,
data: str,
}
Today there's not a huge amount of support for that but I noticed one oddity. So in theory the only way to instantiate such a type is to start with a concrete type, right? Let's say:
struct FooConcrete {
flag: bool,
data: [u8; 10],
}
I would personally expect, then that if we had a function like:
fn foo(drop_me: Box<Foo>) {
drop(drop_me);
}
That function would simultaneously run the destructor for the underlying type and also free the memory. Currently, however, it only frees the memory and doesn't actually run any destructor. More interestingly this namely to me at least means that the Box<Foo>
type needs two pieces of external data, both a vtable for the destructor but also a usize
for the length of the dynamically sized data.
I'm not sure if this is just an unsupported use case perhaps? Or maybe support for this is planned elsewhere? Or maybe this isn't part of this issue at all?
In any case just wanted to bring it up!
What do you mean by the destructor not being called? I am quite sure it is:
use std::mem;
struct Foo {
flag: bool,
data: str,
}
struct FooConcrete {
flag: bool,
data: [u8; 10],
}
impl Drop for Foo {
fn drop(&mut self) {
println!("splat! {:?} {}", self.flag, &self.data)
}
}
fn main() {
let _foo: Box<Foo> = unsafe { mem::transmute(
(Box::new(FooConcrete {
flag: false,
data: [0; 10]
}), 10)
)};
}
Code above is buggy, better version:
use std::mem;
#[repr(C)]
struct Foo {
flag: bool,
data: str,
}
#[repr(C)]
struct FooConcrete {
flag: bool,
data: [u8; 10],
}
impl Drop for Foo {
fn drop(&mut self) {
println!("splat! {:?} {}", self.flag, &self.data)
}
}
fn main() {
let _foo: Box<Foo> = unsafe { mem::transmute(
(Box::new(FooConcrete {
flag: false,
data: [0; 10]
}), 10usize)
)};
}
Oh I mean when you implement the destructor for FooConcrete
instead of Foo
, that one isn't run. (your example gives a segfault locally for me as well...)
I'm specifically thinking of something like this:
pub struct Foo {
flag: bool,
data: str,
}
#[no_mangle]
pub extern fn drop(_b: Box<Foo>) {
}
where the IR is:
; ModuleID = 'foo.cgu-0.rs'
source_filename = "foo.cgu-0.rs"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%Foo = type { i8, i8 }
%"unwind::libunwind::_Unwind_Exception" = type { i64, void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [6 x i64] }
%"unwind::libunwind::_Unwind_Context" = type {}
; Function Attrs: nounwind uwtable
define void @drop(%Foo* noalias nonnull, i64) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh
_personality {
entry-block:
%2 = add i64 %1, 1
%3 = icmp eq i64 %2, 0
br i1 %3, label %_ZN4drop17hc3530aa457fb35acE.exit, label %cond.i
cond.i: ; preds = %entry-block
%4 = getelementptr inbounds %Foo, %Foo* %0, i64 0, i32 0
tail call void @__rust_deallocate(i8* %4, i64 %2, i64 1) #1
br label %_ZN4drop17hc3530aa457fb35acE.exit
_ZN4drop17hc3530aa457fb35acE.exit: ; preds = %entry-block, %cond.i
ret void
}
; Function Attrs: nounwind
declare void @__rust_deallocate(i8*, i64, i64) unnamed_addr #1
; Function Attrs: nounwind
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #1
Notably there's some concrete type behind Box<Foo>
, so presumably some dynamic call to a destructor needs to be made, but in this case it isn't :(
@alexcrichton transmuting Box::new((..., 10usize))
instead of Box::new((..., 10))
fixes the segfault (bug?).
Back on topic, Foo
isn't a trait so I don't see why Box<Foo>
should have a vtable; Box<Foo>
is a concrete type (just an unsized one). After transmuting, Box<Foo>
should (and does) forget it ever was a Box<FooConcrete>
.
@Stebalien ah yes that does indeed fix it!
For me at least I'd expect Box<Foo>
here to still run the destructor of the underlying type. It's not a trait object, yeah, but it's still erasing the underlying type. Otherwise this'd end up leaking resources, and would practically not be too useful I'd imagine?
(that being said I'm still not sure of the practicality of the Foo
type anyway)
For me at least I'd expect Box
here to still run the destructor of the underlying type. It's not a trait object, yeah, but it's still erasing the underlying type. Otherwise this'd end up leaking resources, and would practically not be too useful I'd imagine?
Why? They're totally unrelated types; FooConcrete
isn't an underlying type, it's a different type. As a matter of fact, as-is, the code above is UB: rust could decide to lay out FooConcrete
as ([u8; 10], bool)
instead of (bool, [u8; 10])
. Only Foo
is guaranteed to have the data
field come last (because DSTs must always come last).
IMO, the correct correct way to do this would be to allow box Foo { flag: true, data: *"my string" }
, but, for now, I don't see any way to safely construct a Foo
(without invoking UB).
I mean... I'm stating my opinion. Today the destructor isn't run. I'm trying to clarify if it should be run. I believe it should be run. If we don't run the destructor then to me this isn't a very useful feature in practice because that just means resources are leaked which tends to want to be avoided.
I just don't understand why it should be run. Would you expect the following to print "dropping bar" (it currently prints "dropping baz")?:
use std::mem;
struct Bar;
struct Baz;
impl Drop for Bar {
fn drop(&mut self) {
println!("dropping bar");
}
}
impl Drop for Baz {
fn drop(&mut self) {
println!("dropping baz");
}
}
fn main() {
unsafe {
let bar: Box<Bar> = Box::new(Bar);
let _baz: Box<Baz> = mem::transmute(bar);
}
}
The only difference is that Baz
is sized.
@alexcrichton
Of course, I should make sure to be portable to 64-bit architectures and add #[repr(C)]
everywhere to make it valid :-).
I think the semantics of destructors are quite clear by this point. I have seen the pattern struct Foo([u8]);
used as a newtype several times (because there is no header field, you can create it by directly transmuting a [u8]
).
Ok well if the answer here is "no" then it's no.
My intuition follows from that we always run the destructor of the type behind a trait object. I don't see how that's different here. There's an unsized type that represents some dynamic selection of a particular type at runtime. That unsized type, when dropped, is then responsible for cleaning up the runtime type.
@Stebalien I think your example is missing the point, of course that destructor does not run. For this example:
struct Foo {
a: bool,
b: str,
}
struct FooConcrete {
a: bool,
b: [u8; 10],
}
impl Drop for FooConcrete {
fn drop(&mut self) {
println!("drop");
}
}
fn coerce(a: Box<FooConcrete>) -> Box<Foo> {
// ..
}
fn main() {
let a = Box::new(FooConcrete {
a: false,
b: *b"helloworld",
});
let b = coerce(a);
drop(b);
}
I would expect that example to print drop
, but it does not today and fundamentally seems to not have support for doing so.
Again I'm talking about my own personal intuition for where I thought we were taking dynamically sized types. If this is not the direction that we're going in then it just means Box<Foo>
in this example isn't a very useful type and should be avoided.
@alexcrichton: My opinion is that there's confusion here around what "the type behind a trait object" means. Personally, I've seen two approaches taken:
Foo<Concrete> -> Foo<Dst>
- before your question, this was the only one I'd seenFooConcrete -> FooDst
The former seems to me like it has a clear relation via the Unsize<Dst>
trait impl of its parameter.
The latter seems like it needs an impl Unsize<FooDst> for FooConcrete
, which isn't a thing AFAIK.
EDIT: In addition, I'm used to DSTification being done with as
, not transmute
. Since transmute
is a reinterpretation, nothing more, I am wholly unsurprised by the behavior you observed. You reinterpreted a FooConcrete
as a FooDst
, and FooDst
has no destructor. The type system lost that FooConcrete
was involved at transmute
- it is not "behind" FooDst
, as that's what Unsize
is for..
Thanks @eternaleye; now I see where you're coming from @alexcrichton. FYI, here's the relevant part of the RFC:
There is an
Unsize
trait and lang item. This trait signals that a type can be converted using the compiler's coercion machinery from a sized to an unsized type. All implementations of this trait are implicit and compiler generated. It is an error to implement this trait.
Hello ! I have noticed a weird lack of symmetry is the types for which Unsize
is implemented. It seems to be reflexive for traits but not for slices; see:
#![feature(unsize)]
trait Trait {}
struct A; impl Trait for A {}
fn f<T: ?Sized, U: ?Sized>() where T: std::marker::Unsize<U> {}
fn main() {
// for traits:
f::<A, Trait>(); // works normally
f::<Trait, Trait>(); // works; Unsize is reflexive
// for slices:
f::<[usize; 4], [usize]>(); // works normally
f::<[usize], [usize]>(); // error; Unsize isn't reflexive ?
}
I was told that this could be an omission in design. Is that the case ? Thank you for your time and great language.
@carado Looks like a fair point. Maybe open a new issue about it?
Currently CoerceUnsized
is only implementable for structs that have exactly one field being coerced.
I would like to implement it for a type that that has zero coercible fields (i.e. only PhantomData
fields).
Is there any reason why type coercion couldn't work for a struct with only PhantomData
fields?
Perhaps I do not understand correctly, but isn't nothing preventing unsize
from being stabilized?
Stabilizing the current CoerceUnsized
trait will prevent us from ever adding custom DST's that don't built on top of slices and trait objects. For example CStr
except that the length is calculated when required rather than stored in the pointer metadata. Or CppObject
where you call into C++ to get the size of the value behind a reference.
Stabilizing the current
CoerceUnsized
trait will prevent us from ever adding custom DST's that don't built on top of slices and trait objects. For exampleCStr
except that the length is calculated when required rather than stored in the pointer metadata. OrCppObject
where you call into C++ to get the size of the value behind a reference.
From what I understand though,
#![feature(unsize)]
does not require #![feature(coerce_unsized)]
?
I don't think Unsize
is useful without CoerceUnsized
. Unsize
is used for implementing CoerceUnsized
, but the actual unsizing operation uses CoerceUnsized
. I also expect manually implementing Unsize
to either be forbidden or be able to cause compiler crashes or unsoundness.
I don't think
Unsize
is useful withoutCoerceUnsized
.Unsize
is used for implementingCoerceUnsized
, but the actual unsizing operation usesCoerceUnsized
. I also expect manually implementingUnsize
to either be forbidden or be able to cause compiler crashes or unsoundness.
This is a use case where Unsize
is useful @bjorn3
Useful snippets:
/// The command we are executing
#[derive(Discriminant)]
enum Cmd {
/// a zsh script to execute
Zsh,
Bash,
}
#[async_trait]
trait Command {
async fn execute(&self, exec: Ctx, input: &str) -> anyhow::Result<String>;
}
#[async_trait]
impl Command for Zsh {
async fn execute(&self, exec: Ctx, input: &str) -> anyhow::Result<String> {
// ...
}
}
#[async_trait]
impl Command for Bash {
async fn execute(&self, exec: Ctx, input: &str) -> anyhow::Result<String> {
// ...
}
}
fn this_requires_unsize() {
let cmd1: Box<dyn Command> = Cmd::Zsh.cast();
let cmd2: Box<dyn Command> = Cmd::Bash.cast();
}
where Discriminant
generates
impl From<Zsh> for Cmd {
fn from(value: Zsh) -> Self { Self::Zsh }
}
impl std::convert::TryFrom<Cmd> for Zsh {
type Error = Cmd;
fn try_from(value: Cmd) -> Result<Self, Self::Error> { if let Cmd::Zsh = value { Ok(Zsh) } else { Err(value) } }
}
#[doc = " a zsh script to execute"]
struct Zsh;
impl From<Bash> for Cmd {
fn from(value: Bash) -> Self { Self::Bash }
}
impl std::convert::TryFrom<Cmd> for Bash {
type Error = Cmd;
fn try_from(value: Cmd) -> Result<Self, Self::Error> { if let Cmd::Bash = value { Ok(Bash) } else { Err(value) } }
}
struct Bash;
impl Cmd {
fn cast<U: ?Sized>(self) -> Box<U> where Zsh: ::core::marker::Unsize<U>, Bash: ::core::marker::Unsize<U> {
let value = self;
let value = match Zsh::try_from(value) {
Ok(v) => {
let x = Box::new(v);
return x;
}
Err(v) => v,
};
let value = match Bash::try_from(value) {
Ok(v) => {
let x = Box::new(v);
return x;
}
Err(v) => v,
};
unreachable!();
}
}
I'm curious why CoerceUnsized would prevent changing how dsts work. It's a trait with no methods. Could it be limited to derive, that is make it impossible for anyone to write a manual impl, and then the behavior could be changed later? That is to say that while the restriction today would be struct of one field etc. because no one can manually implement it, the restriction could be lifted later once the behavior is more defined and/or someone decides what to do about the dst general case.
My use case not that long ago is that I'm writing audio code which needs to never deallocate on the audio thread, and I wanted to have custom smart pointers which would defer deallocation to another thread in the background. They were also going to allocate objects from pools for cache friendliness. This can't be built on top of Arc
and Weak
without stabilizing custom allocators (and maybe not even then?) but this would have neatly solved the problem as far as I can tell. Instead, I dropped trying for it in favor of stdlib Arc
and have moved to an architecture with custom 64-bit ids to do cross-thread addressing, wherein the user-side pieces are going to literally basically be Arc<u64>
. Perhaps I will go back to it eventually, only with very specific unsizing ops of my own to dedicated traits.
Is anyone actively working on custom dsts? I keep seeing people saying it's important but the only use case I think I've seen is async traits. I don't find the "get size from C++ to put thing on the stack" use case that interesting because C++ itself doesn't really allow for that in the first place. As far as I know no popular language is talking about trying to e.g. put virtual classes on the stack. You can do it in C with alloca or gcc extensions but those just cause more problems than they're worth in my experience: suddenly you're open to stack overflows because of a math bug or whatever. And a similar thing with CStr
that doesn't store length: now you've got possibly-implicit strlen all over.
I'm not saying "O dsts are useless" just that I see people talking about them as useful but never really having examples, and in
this case we have a general concrete usefulness for CoerceUnsized
and it's easy to understand what that is.
Also, I see a soundness hole in many of the examples of custom dsts that get brought up from time to time in any case. They can make Rust unsound: if someone figures out how to get a value into the program they can read any pointer relative to the stack, at least for most schemes which would involve bumping sp or similar (and presumably anything more complex would have enough overhead that you'd go to the system allocator). It seems to me that custom dsts are years away but CoerceUnsized
could probably be done in the very near future.
I don't find the "get size from C++ to put thing on the stack" use case that interesting because C++ itself doesn't really allow for that in the first place. As far as I know no popular language is talking about trying to e.g. put virtual classes on the stack.
Having Box<MyDst>
also requires the length to be known at runtime as it is passed to the deallocation function.
For your use case can you have a type that is equivalent to Arc<[T]>
rather than Arc<U>
? That is specializing it to only work with slices rather than any (dynamically sized) type? In that case I don't think you need CoerceUnsized
.
Hey @bjorn3! 😊
I wanted to get a better understanding of your perspective on the unsize
feature. Do you think it's ready to be on the fast track to stabilization, or are there any concerns that you believe need to be addressed first?
I noticed that you liked my previous response, but it seemed like you might have had some disagreement with it earlier. I'd really appreciate your thoughts on this matter because I'd love to be able to use unsize
as quickly as possible in stable.
My use case wouldn't have been a slice, it would have been Arc<SineGenerator>
to Arc<dyn Node>
on top of my own non-std smart pointers, where the actual raw stored pointer pointed into a slab. It's been a bit since I tried to do it, but as I recall I gave up on a general solution because writing helpers to do the downcasting was getting gnarly. I'll revisit more specific solutions eventually but only after benchmarks etc; I thought I could produce a generally useful library which could be published to crates.io but it turned out I couldn't because I was running into combinations of unergonomic APIs and type constraints. It's a shame I didn't think to come here around the time I was doing it, if I had I'd have been able to articulate this better.
Essentially, I wanted to do:
struct SmartPointer<T> {
page: *mut SlabPage,
data: *const T,
}
It would have had to be:
struct SmartPointer<T> {
data: *const T,
}
#[repr(C)]
struct SlabItem<T> {
header: *mut SlabPage,
data: T,
}
Which is a bit less ideal (the backpointers get in the way of a fully contiguous array) but workable with what we have here. It would of course be possible to go further and do pointer arithmetic and always align slab pages to OS pages, but that's kinda beside the point. these would have had Drop impls that pushed to a background thread if the slab page needed to go away, but that's also beside the point (tldr: you can write realtime-safe audio code if you assume reasonable limits for everything and glitch if someone makes the library have to allocate more on the audio thread).
Also I was conflating dsts on the stack with dsts in general which I realized a few minutes ago. I'm still not sure I see any particular benefit to extending the model or any reason why CoerceUnsized
is incompatible with a more advanced model.
What is a realistic case wherein I'd be dealing with something that isn't effectively (length, dst)
? If the length has to be passed to the deallocator, then it needs to always be known somehow or other. If the object came from C, then presumably it needs to be deallocated by C since Rust requires sizes and it's not guaranteed that Rust is using malloc.
Is the issue that we would want to coerce from struct MySizedType
to dst MyUnsizedType
where the compiler doesn't understand the coercion as it does slices/trait objects? E.g. the C trick of a header in the first field. I could sort of see that being a problem but it seems like a derive-only restriction for now lets it be extended later.
Fwiw, I've been experimenting with Unsize and CoerceUnsized recently, see https://github.com/ferrous-systems/unsize-experiments. I was planning on trying to write down my findings there as a pre-RFC next week.
Fwiw, I posted a pre-RFC on irlo: https://internals.rust-lang.org/t/pre-rfc-flexible-unsize-and-coerceunsize-traits/18789
The commit that implemented CoerceUnsized
for {Cell, RefCell, UnsafeCell}
seems to not specify ?Sized
on the type parameters.
Is this intentional? The documentation for CoerceUnsized
just says:
For wrapper types that directly embed
T
likeCell<T>
andRefCell<T>
, you can directly implementCoerceUnsized<Wrap<U>> for Wrap<T> where T: CoerceUnsized<U>
.
From which I assume should allow T: ?Sized
and U: ?Sized
.
This prevents me from writing the following:
struct Inner<T: ?Sized> {
meta: u32,
value: RefCell<T>,
}
// This errors because `RefCell<T>` only allows `T: Sized, U: Sized`.
impl<T: ?Sized, U: ?Sized> CoerceUnsized<Inner<U>> for Inner<T> where T: Unsize<U> {}
let rc: Rc<Inner<i32>> = /* ... */;
let rc = rc as Rc<Inner<dyn Any>>;
let rc = Rc::downcast::<Inner<i32>>(rc).expect("Unable to cast");
Which I think should be fine.
I think U
should at least be ?Sized
(but I'm not sure if it makes sense for T
to be ?Sized
, given the lack of "super-fat" pointers currently (wider than 2 usize
s))
I'd love to see this stabilized so that we may comfortably use DSTs in the bit-vec
crate, in particular for this feature: https://github.com/contain-rs/bit-vec/issues/46
Tracking issue for RFC rust-lang/rfcs#982.
Current state:
CoerceUnsized
trait is unstable and we wish to revisit its design before stabilizing, so for now only stdlib types can be "unsized"Things to consider prior to stabilization
Part of #18469