Open yaahc opened 2 years ago
Example PR using ThinBox as a substitute for Box<dyn Error + ...>
: https://github.com/yaahc/pgx/pull/1/files
One unfortunate thing about ThinBox is that it's invariant rather than covariant, like Box. For example, this playground fails to compile: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f6348153d6555716ce86926ac1928a4e.
This happens because of the <T as Pointee>::Metadata
projection that is used internally. Projections like these currently require invariance. I'm not sure if there's a way to fix it, but it's unfortunate to differ from box in this way IMO. Arguably, this is more of an issue with the Pointee trait than with ThinBox itself, though, since it's not like ThinBox can fix it -- it also is less likely to matter in most of ThinBox's proposed use cases (although it would be nice if the version from the stdlib could somehow do better here).
A use case that's slightly more realistic for ThinBox is https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=e2659853940505c62b73b21deb079195:
#![feature(thin_box)]
use std::boxed::ThinBox;
pub trait Foo {}
pub fn add_foo(foos: &mut Vec<ThinBox<dyn Foo + '_>>, foo: ThinBox<dyn Foo + 'static>) {
foos.push(foo);
}
This fails with a lifetime error, because '_
is not 'static
. This is actually safe usage, though, and with Box
it works fine: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=dc3e919e611f4e91ac0567db47ac4d92.
FWIW, I also noted the invariance of <T as Pointee>::Metadata
in https://github.com/rust-lang/rust/issues/81513#issuecomment-1019371954.
Send
and Sync
are also missing, compared to Box<T>
, because NonNull<_>
blocks them by default. I think it should be perfectly fine to justify manual impls for ThinBox<T>
though.
Oh, it should definitely be Send and Sync.
Is it not supposed to implement Eq? because Box does.
I was wondering, Because I get this error:
the trait Eq is not implemented for ThinBox<Vec<Item>>
Eq
is not implemented yet, but it seems reasonable to add, and eventually mirror most of Box
's API. For now, I think the initial implementation was intentionally kept light.
We should think at least about the fundamental traits before stabilization -- Fn
, FnMut
, FnOnce
-- because they can't be added later. (That's why Rc
and Arc
don't implement them -- see #71570.)
Maybe ThinBox
should also be a fundamental type like Box
.
Another auto-trait consideration is Unpin
-- Box<T>
is unconditional, but ThinBox<T>
is currently auto for T: Unpin
. There's a big note about this on Box
, so I'm not really sure whether ThinBox
should get that same exception.
I always thought that Unpin was automatically implemented on all types. Does unconditional simply mean you can't opt out of Unpin?
The unconditional impl on Box<T>
means it does not require T: Unpin
, while ThinBox<T>
does require that.
As a side question, is there any plan for a similar ThinArc
? This would enable concurrent libraries to use new patterns like arc_swap::ArcSwapAny<ThinArc<dyn Trait>>
due to atomicity of thin pointers.
Another auto-trait consideration is
Unpin
--Box<T>
is unconditional, butThinBox<T>
is currently auto forT: Unpin
. There's a big note about this onBox
, so I'm not really sure whetherThinBox
should get that same exception.
@cuviper The pointer to the T
in both Box<T>
and ThinBox<T>
only differ by a constant offset (specifically, std::mem::size_of::<<T as Pointee>::Metadata>()
). Since Box<T>
implements Unpin
and, for all constant n
, a + n
points to a stable address if and only if a
does, ThinBox<T>
could likewise safely implement it regardless of whether T
itself is Unpin
.
<T as Pointee>::Metadata
would still need to be Unpin
, but the Pointee
trait already separately requires Metadata
associated types to implement Unpin
, so that's not a concern.
@dead-claudia It can be done safely, but there's a trade-off. Referring to the code comment that I linked, we can either have unconditional Unpin
or pin projection, but not both.
* Another type with the same semantics as Box but only a conditional
* implementation of `Unpin` (where `T: Unpin`) would be valid/safe, and
* could have a method to project a Pin<T> from it.
So the question is whether ThinBox
should be "another type" like that, or should it mimic Box
exactly.
@cuviper Could you explain the difference between the two? I'd think that Pin::new(&mut my_box)
would work basically the same as fn project(some_box: Pin<MyBox<T>>) -> Pin<&mut T>
? Or am I missing something here, like it really being about whether it should be !Unpin
as a lint similar to pointers being !Send
and !Sync
?
I still get confused about pinning myself, but hopefully this section of the docs will help you: https://doc.rust-lang.org/std/pin/index.html#projections-and-structural-pinning
That said, I think we probably do want unconditional Unpin for ThinBox<T>
(and not projection) for this same justification about trait objects, since unsized types are pretty much the whole point of ThinBox
.
- It is in practice very useful to have Box<T> be unconditionally Unpin because of trait objects, for which the structural auto trait functionality does not apply (e.g., Box<dyn Foo> would otherwise not be Unpin).
It would be very nice if this structure could support the fallible allocation operation API's provided by Box in #32838. I'll try to whip up an example PR.
Example diff as promised: https://github.com/rust-lang/rust/pull/110483
So, I'm wondering if there would be some way to recontextualize this in terms of an unsized struct instead of a thin box as-is. In other words, whether it would be possible to make ThinBox<T>
simply Box<Thin<T>>
, where Thin<T>
is an unsized type that implements Deref<Target = T>
.
My justification for this is for use cases such as unsized_vec
, where you would want a nicer API to encapsulate all the logic of computing a combined layout for metadata + value, and perhaps the ability to coerce a pointer into a Thin<T>
and then use that to compute its size.
I was also thinking of this from the perspective of creating an unsized_vec type data structure, except for storing trees inside a single buffer.
If this seems even remotely doable, I'm willing to sketch out a potential API for it and maybe a PR for it, but I don't know 100% what the reasoning for specifically going for a ThinBox
here was.
My original motivation for working on this was use cases in error handling. I wanted to have the same thin pointer benefits that anyhow
and eyre
provide but generalized, and I was hoping that ThinBox
could be made into a preferred error type over Box<dyn Error>
and have it actually implement Error
for ThinBox<dyn Error>
rather than implementing From<E> where E: Error
while still having ?
work on all error types with some clever usage of FromResidual
instead of From
and a custom Result
type.
In the end I gave up on it because I didn't think the UX would be adequate to justify all the extra types people would have to juggle and think about, especially with the whole thing requiring a second Result
type to work. Feel free to cut it up however you want to find value in any of the work. I think the BiPtr abstraction internal to the implementation is particularly cool and might be close to the Thin
type you're imagining already. Back when I wrote this I wanted to go back and update the Rc
implementation to use the same stuff internally but never got around to it.
Makes sense! I have other stuff I want to get to first, so, I don't mind if someone ends up getting to this before me. Going to just dump my thoughts here on what I was thinking of for a design, although I haven't actually hashed it out much.
The main thing that came to mind for this is that we'd effectively end up having to resolve the DST issues that are preventing us from making CStr
not have any pointer metadata: the size would have to be knowable from the value itself, not the metadata, or potentially not knowable at all in some cases.
I was thinking that it might be useful to have effectively three versions of the thin wrapper: one that's equivalent to the current implementation, but another where the metadata is stored after the value, and one where the metadata is stored redundantly on both sides. The main reasoning for this is that the "unsized slice" case where you're storing multiple of these objects in one buffer might want the metadata at the end to allow backwards iteration.
One particular case I was looking into is the ability to have a "type-safe" version of a tree storable in a buffer, primarily for parsing prefix/postfix arithmetic expressions. For prefix expressions, you'd have the metadata at the beginning of operands, whereas for postfix expressions, it would be at the end. And a more general "unsized vec" might just have them on both sides to allow easy double-sided iteration.
For the case where metadata is only at the end, it's an example of a DST whose size isn't knowable from a pointer alone. If you know the size of it already, you can find the metadata and reconstruct a fat pointer, but you need the size or a pointer past the end of it in order to do that. If it's at the beginning, you can just read the metadata to determine the layout.
Just chiming in here since this came up as a potential solution to a problem but it seems there's no real way to create a ThinBox<str>
should the user want it. Likewise only being able to create a ThinBox<[T]>
from an array is pretty limiting compared to the ubiquity of Vec
.
Can we get a From<String> for ThinBox<str>
implementation and/or From<Vec<T>> for ThinBox<[T]>
by chance? Those conversions are available for Box
(among many others) and would be nice to have.
Presumably rust-lang/rfcs#3536 will help provide a solid path for this.
What is the difference between this and a normal Box
?
What is the difference between this and a normal
Box
?
A Box
will store unsized types (slices, str
, object traits, etc.) as a pointer to the data and some metadata related to the type (slices and str
s store their length, and object traits store a pointer to a table with the function pointers to the trait's methods).
A ThinBox
would always only store a single pointer on the stack, by writing the metadata that a regular Box
would have on the stack alongside the data it points to.
This would make it easier to pass pointers to unsized types through FFI boundries.
Example of Box<str>
v. ThinBox<str>
Box<str>
[ Metadata ][ Pointer to string data ]
ThinBox<str>
[ Pointer to data ] --> [ Metadata, string data ]
Is it guaranteed what the layout of Metadata will be? Would it be feasible to make a ThinBox FFI-safe?
If we can guarantee:
Metadata
is a usize
containing the length T
ThinBox<[T]>
is repr(transparent)
over a *const ()
ThinBox<[T]>
points to the first T
a ThinBox<[T]>
is safe to pass through FFI, and the FFI users can just offset backwards by a usize
to get a pointer to the length. however, they couldn't free the ThinBox<[T]>
, as they'd need to offset to the beginning of the entire buffer, where malloc
would store the size of the buffer to free later on, unless, of course, Metadata
has a guaranteed size and (semi-optionally) layout.
If we can guarantee:
- the final item in
Metadata
is ausize
containing the length
I was under the impression this is exactly the kind of thing we didn't want to have, since we want a path forward for types which are unsized but not length-prefixed, like CStr
, and we don't want to guarantee exactly where the size field is stored in stuff like vtables.
In that case, perhaps it could be made optional somehow - the problem there would be how we make the length optional. It's valid to have a slice with a length of 0, and a pointer that's not NULL. You can also make a Box<[T]>
from a Vec<T>
with a length of 0. If we allocate an additional byte for a "does this type have an immediate length?", we can reference that instead.
In that case, perhaps it could be made optional somehow - the problem there would be how we make the length optional. It's valid to have a slice with a length of 0, and a pointer that's not NULL. You can also make a
Box<[T]>
from aVec<T>
with a length of 0. If we allocate an additional byte for a "does this type have an immediate length?", we can reference that instead.
Worth noting that, for fat empty slices, the slice pointer is in practice still never null, to ensure that Some(&[])
and None
can be told apart without needing an extra discriminant byte. Instead, the empty slice pointer is usually NonNull::<T>::dangling()
, but not always, and can be just about any nom-zero value.
Not that you should ever depend on this for correctness, mind you. This is just the current implementation to my knowledge.
You can depend on it being non-zero because &T
is guaranteed to non-zero, but another reason why you can't depend on an empty slice being equivalent to dangling
is because you can easily create an empty slice from a non-empty slice, e.g. s[..0]
. This is also what you would get from calling as_slice
on an exhausted slice::Iter
; the empty slice's pointer would be somewhere between the start and past-end of the valid slice, depending on the order of iteration.
Empty boxes might be able to optimise differently since you generally don't want to reference a valid allocation if it's empty, although there are also other cases that depend on pointer-equality being different even for empty allocations.
I would just generally not rely on pointer checking to determine size. The main case of a no-metadata DST is CStr, which is never an empty allocation; an empty string is [0]
. Even if we statically allocate one "empty string" that we reuse, that would still be a valid pointer to a valid buffer, just the same one.
More ways of making ThinBox slices would likely be helpful - e.g.:
FromIterator
implementationThinBox::new_uninit_slice(n: usize)
for only allocating the sliceNow that I'm thinking about this more, I'm starting to like the basic idea in https://github.com/rust-lang/rust/issues/92791#issuecomment-1791740424 more and more. Here's my intuition:
I do feel it can't be fully done in userland, though, and it feels more like a dyn Trait
than a Pin<P>
in the way it modifies unsizing, rather than merely decorating it. Makes me feel that &thin T
, Box<thin T>
, and so on would provide a better, cleaner interface. For a thin trait object, you'd do &thin dyn Trait
, and for a thin slice reference, it'd be &thin [Foo]
. The compiler could just reserve that storage on the stack transparently if a &thin T
is requested. And for allocators, std::mem::size_of_val(&thin value)
returns the metadata size plus std::mem::size_of_val(&value)
.
Feature gate:
#![feature(thin_box)]
This is a tracking issue for
ThinBox
for making heap allocated DSTs which only occupy 1 pointer on the stack.Public API
Note: This API is expected to grow to match
Box
as necessary over time. The initial API is the minimum needed for basic usage for error handling.Steps / History
Send
andSync
: #98595Unresolved Questions
const_allocate
fleshed out and tested enough to justify it being used in thinbox's impl? Does T-lang agree we'll always have some way to procedurally generate allocations in const eval? (in contrast to declaratively via constants, statics or anonymous promoteds)