Closed Gankra closed 6 years ago
:+1: to any of these, even if just as an experiment to see if it would be worth it in practice.
rust issue: rust-lang/rust#5977
Be wary of optimizations that are incompatible with pointers to the payload.
For example
let mut x: Option<Option<u32>> = Some(None);
let y: &mut Option<u32> = x.as_mut().unwrap();
y
is a reference to just an Option<u32>
, so we can't fudge its discriminant from the outer layer.
Option<u32>
is isomorphic to (bool, u32)
which should be null-pointer optimizable (well - 2'd pointer optimizable).
(bool, u32)
is a nice way to think about it, it should work.
A general-purpose "ForbiddenValues" wrapper as you described in rust-lang/rust#27573 would be nice, but how exactly would you tell the compiler which values are forbidden? Even just for primitive types it's difficult, considering that as yet the type system has no support for variadics or constant parameters.
Supporting compound types is even trickier. To be fully general, you have to be able to describe any possible subset of any possible compound type, and do so in a concise way which the compiler can easily reason about, and do it all without relying on any particular layout choice. Short of using a compiler plugin, I have no idea how you do that.
Basically: maybe it's worth stabilizing simple cases like NonZero
now - it may well be that in future it will simply be a type alias to a ForbiddenValues
type, but that should be a backwards compatible change.
(&u8, &u8)
can support 3 variants
It seems like one could also null the first field and use the second as a tag, allowing 2^64 (or 2^32) variants.
@rkjnsn Oh yeah, great point!
Another idea: Make use of padding bytes in nested enums:
enum A {
Aa(u32, B),
Ab(u32, B),
}
enum B {
Ba(u32),
Bb(u32)
}
Today, this leads to this kind of memory layout:
A [u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8]
B [u8, u8, u8, u8, u8, u8, u8, u8]
[ Apadding ] Atag [ u32 ] [ Bpadding ] Btag [ u32 ]
Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A
tag in the B
padding area:
A [u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8]
B [u8, u8, u8, u8, u8, u8, u8, u8]
[ u32 ] [ ABpad ] Atag Btag [ u32 ]
The idea here being that a B
value never reads or writes its padding area, so its in principle free to be used by outer enums tags.
This would require that the compiler ensures that padding in a enum doesn't get overwritten with junk bytes though, which might be tricky or inefficient in combination with mutable references to such an enum (You couldn't just overwrite all bytes on reassignment).
You could also use the bit0/1 in pointers witch are always zero under some aligment conditions like
if bit0 is 1 then it's not a pointer and bit7-1 are the actual tag. This is a bit more efficient then just having a null pointer as additional marker. (e.g. in the (&u8, &u8)
case)
@naicode That may be more space efficient, but it's less time efficient because of the bit shifting and masking required. Rust probably shouldn't make that kind of change automatically.
It's a neat idea, though. Reminds me of Ruby's internal VALUE
type which if the LSB is 0 then it's a pointer to an object and if the LSB is 1 then the rest of the bits encode an immediate value like true
, false
, nil,
a symbol, or an integer (there is a further way to differentiate those). So Ruby values are like a super-compact Rust enum in a way.
It's true that such bit-fiddling slows down the program at runtime and therefor should never be applied implicitly.
If more exotic layout optimisations are implemented it might still be possible to opt-in to such optimisations on per-struct-basis e.g. a #[repr(compact_rust)]
Attribute witch uses rust representation + space optimisations witch have a runtime cost.
(e.g. for usage of rust in embedded environment with small sized RAM, or a enum's allocated in massive amounts on the heap).
cc rust-lang/rust#24041
Another idea: compress nested enum variant ids (thanks Mutabah over IRC!).
enum A { A1, A2 }
enum B { BA(A), B1, B2 }
This currently requires space in B for the variant id of A, but one could assign BA+A1 = 0, BA+A2 = 1, B1 = 2, B2 = 3, and both save space and maintain the ability to take a reference to the stored A (because its payload may presumably be stored immediately following its variant id [which is also B's] as usual).
@soltanmm Those kind of optimizations are not generally possible, because 'A' must have the same layout regardless of whether it is used within another enum (otherwise references to the 'A' within 'BA' would have a different layout from normal references to 'A'.
Furthermore, you can't adjust the layout of all 'A's in advance because 'B' might be in a different crate.
What you can do is treat the determinant in 'A' as a restricted value (either 0 or 1 for A1/A2) in which case the 'B' is free to use the disallowed values as IDs for the other variants (B1, B2) but that kind of optimization has already been covered in this issue.
@Diggsey note that @soltanmm's suggestion doesn't adjust the layout of A (in both standalone and optimised form A1
is 0, and A2
is 1, although the particular values don't matter). AFAICT, your last paragraph is exactly what @soltanmm is saying.
@huonw EDIT: I just realized I might have interpreted that incorrectly.
Almost exactly, I think, but not quite (specifically the bit about this being already covered).
@Diggsey I believe the optimizations discussed most similar were using forbidden values in payloads or placing determinants in padding, as opposed to outright summing the determinants in the same location. Having the determinant be a value for which forbidden values might be used wasn't mentioned, and this has slightly different space-time trade-offs to @Kimundi's example with the determinants being split into different sequences of bit positions.
Arguably, though, everything in this discussion is some variant of, "Use forbidden values to encode variants," so as long as everyone's kicking the horse I guess I concede that to @Diggsey. :-)
Would it be possible to enable NonZero for the common Rust enum? Like
enum Status {
Alive, // discriminant would start at 1 instead of 0
Suspect,
Dead,
}
size_of::<Status>() == size_of::<Option<Status>>()
@arthurprs No, that's not a good choice, first off you can't depend on knowing whether Option<Status>
is used in the program, and secondly what if I have Option<Result<Status, ()>>
?
The general niche optimization would use values above those of the original enum
, which scales.
The general optimization scales better, but @arthurprs's suggestion seems like it would technically work fine. You don't need to depend on knowing about Option<Status>
, you can just always start from 1, unless we have existing reasons to prefer starting from 0.
Rust rely so much on these patterns that any gain would be huge.
Option<Result<Status, ()>>
Can't it be the usual 2 discriminants + Status?
If you manually wrote enum Status { Alive=1, ... }
, could it decide this is NonZero
? But generally speaking, I would agree that broader Forbidden Value analysis is more promising.
I'm surprised enums like
enum E {
A = 1,
B = 2,
C = 3,
}
don't use non-zero optimization right now.
If you need to save space you have to add an artificial None
-like variant instead of using idiomatic Option<E>
.
From the reference:
If a discriminant isn't specified, they start at zero, and add one for each variant, in order.
So it'd have to not be visible to the user, which means doing a bunch of operations to hide it.
I'm surprised enums like ... don't use non-zero optimization right now.
@cuviper @petrochenkov That one's an easy fix, compared to the others, likely just a couple extra lines.
The problem we have is there aren't enough people to work on improvements like these, once they become possible. There's much bigger fish that have more serious implications and we have to pick. Worse, every time we compromise, cleaning it up later becomes harder, which slows us down further.
@camlorn has recently cleaned up the duplication in rustc_trans
and he might be interested in taking a look at niche optimizations, or I can also mentor someone else too. (Find me in #rustc
on IRC)
From the reference:
If a discriminant isn't specified, they start at zero, and add one for each variant, in order.
:cry:
But this only affects C-Style enums. The others (vast majority) still has potential for improvement.
I might take some of these after I finish my current work to optimize field order for anything represented as Univariant
, probably from most impact to least.
Making enums which fall into the CEnum case of Layout
without being explicitly C enums start at 1 should be an incredibly tiny change.
But this only affects C-Style enums. The others (vast majority) still has potential for improvement.
Is there any reason None
couldn't be defined to be 3? That seems less hacky. When you have generic enums wrapping generic enums, etc. rustc can just reserve discriminant ranges as needed. For example, None: Option<Option<E>>
could be 4:
E::A: E == 0;
E::B: E == 1;
E::C: E == 2;
Some(E::A): Option<E> == 0;
Some(E::B): Option<E> == 1;
Some(E::C): Option<E> == 2;
None: Option<E> = 3;
Some(Some(E::A)): Option<Option<E>> == 0;
Some(Some(E::B)): Option<Option<E>> == 1;
Some(Some(E::C)): Option<Option<E>> == 2;
Some(None): Option<Option<E>> == 3;
None: Option<Option<E>> == 4;
@Stebalien That is the optimization @eddyb was referring to, which places values above the original enum's values. It's what I think we should ultimately go with, as well.
@solson Ah. Sorry, I missed that when skimming his reply.
I spoke to @eddyb on IRC and we discussed algorithms for these cases. The conclusion is that it's not actually hard now that repr is deduplicated. We think nothing outside layout has to change. I'm still on the fence, but am leaning towards trying unless someone else wants to take it.
I tend to think of things at like 2 in the morning. The thing I thought of at like 2 in the morning last night is how much optimizing a tree of types to use one discriminant could benefit futures-rs.
CC rust-lang/rust#36237 (a rebase of rust-lang/rust#31215): an attempt to optimize two special cases of Option
.
Indeed, one of @eddyb's comments on that issue echos discussion here.
@mrhota See also this more recent comment - while I realized later that it's not exactly on-topic in that thread, I'm stating the situation as I see it - rust-lang/rust#39999 followed by some refactors is necessary.
Found myself wanting this for creating a small vector of min size 2, e.g.
enum SmallVec<'a> {
A2(&'a u64, &'a u64),
V(Vec<&'a u64>),
}
(since the first member of a Vec
is the Unique
pointer, i.e. non-null). Though this example seems like a clear win, it's less clear what rust should do if A0
and A1
variants were added, since doing the optimisations make it harder to figure out what variant the enum is.
Null pointer optimization in two variants case could be used even if both variants contain values. For example Result<(&i32, usize), usize>
can be stored like (Option<&i32>, usize)
@xfix I was going to say that I don't think this can be done easily, then I started thinking of ways we might be able to do so. So maybe.
But I do think it wouldn't be trivial.
Also. That's really only applicable in a small number of cases. Let (a, n, b)
denote a type with a segment of fields (a) before a non-null pointer (n) followed by a segment of fields (b).
We can only do the optimization if the other variant has either sizeof(other) <= sizeof(b)
or sizeof(other) <= sizeof(a)
. in any other case, the enum itself becomes larger, which we don't want.
It may be possible to tweak the layout algorithms to make this work reliably by always putting references at the beginning of structs--we sort of already do.
Today I am writing unsafe code to represent as (*const u8, usize, PhantomData<&'a str>)
(on two words of memory) a type that is effectively:
pub enum BorrowedOrSharedString<'a> {
Borrowed(&'a str)
Shared(Rc<String>),
}
The compiler could do that optimization: represent Borrowed
as &'a str
’s (&'a u8, usize)
directly (with a non-null pointer), and Shared
as (null, Rc<String>)
@SimonSapin Nice! Did you publish it as separate crate? Seems very similar to Cow
and I think it'd be good to enable code reuse. Similarly with Arc
.
It’s not done yet, and I’m not sure I’ll bother with a separate crate. It is somewhat similar to Cow
in the dual representation, but care much about the copy-on-write aspect.
As to code reuse, I feel that every time I want something like this I want something slightly different.
but care much about the copy-on-write aspect.
What exactly do you mean by that?
Could you at least throw raw code in the wild, so someone could take it and create the crate?
I mean that I haven’t implemented DerefMut
, make_mut
, to_mut
, or anything that switches from one variant to the other. The code is in a work-in-progress branch, I’ll link it here when I open a PR.
I haven't read the whole thread here but I did attempt to grep through to see if this has been mentioned so far - can we store way more tags in a pointer because of its alignment? We know for sure that a pointer of 2-byte alignment can never be odd, so we can store one tag in P=0 and then P=1, P=3, P=5 etc. I think that for n
-byte alignment, we have 1 + (n - 1) * 2.pow(size_of::<usize>() - (n - 1))
possible tags, which should mean that we can have zero tag overhead for almost any pointer-containing enum (since how often do you have a pointer to a single-byte-alignment datatype).
I should say that I didn't come up with this myself, I stole it from a really clever trick in the bytes
crate.
@Jef while this reduces memory consummation it increases CPU consummation as accessing the discernment requires bit making operations. It's also havily dependant on the alignment a point has for a given type on a given platform.
I think optimizations treating CPU cyclers for memory should not
necessarily be done without any way for the programmer to control them.
Maybe something like #[optimize(<what>)]
with what being e.g. size
or
cpu
/access_time
... might be interesting. Note that optimize size is
still different form packed/compact which might create unaligned fields (as
it does not use any padding at all) but might be used to determine which of
multiple fields from which one has to be unaligned is made unaligned.
On Tue, Jul 4, 2017, 12:35 Jef notifications@github.com wrote:
I haven't read the whole thread here but I did attempt to grep through to see if this has been mentioned so far - can we store way more tags in a pointer because of its alignment? We know for sure that a pointer of 2-byte alignment can never be odd, so we can store one tag in P=0 and then P=1, P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) * 2.pow(size_of::
() - (n - 1)) possible tags, which should mean that we can have zero tag overhead for almost any pointer-containing enum (since how often do you have a pointer to a single-byte-alignment datatype). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rfcs/issues/1230#issuecomment-312845071, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR0kW4avRn6xb0SCAP3-VNEsNYOUU-Oks5sKhWCgaJpZM4FjSb8 .
@dathinab Accessing the discriminant would never require bit masking, although accessing the pointer may require bit masking unless you're smart about it. In cases where bit masking would be necessary I would say that yes, that should be explicit opt-in, but it's by no means always the case. For example:
enum ContainsPointer {
A(&Foo), // Assume `Foo` has 2-byte alignment
B,
C,
}
match contains_pointer {
A(thing) => { ... },
B => { ... },
C => { ... },
}
Could be converted to:
struct ContainsPointer(*const Foo);
if contains_pointer.0 == 0 {
// B's path
} else if contains_pointer.0 == 1 {
// C's path
} else {
// A's path
// We know that any value that is not 0 or 1 is valid since we control what gets written
// to a value of type `ContainsPointer`
}
Could you at least throw raw code in the wild, so someone could take it and create the crate?
Here it is.
/// A string that is either shared (heap-allocated and reference-counted) or borrowed.
///
/// Equivalent to `enum { Borrowed(&'a str), Shared(Rc<String>) }`, but stored more compactly.
///
/// FIXME(https://github.com/rust-lang/rfcs/issues/1230): use an actual enum if/when
/// the compiler can do this layout optimization.
pub struct CowRcStr<'a> {
/// FIXME: https://github.com/rust-lang/rust/issues/27730 use NonZero or Shared.
/// In the meantime we abuse `&'static _` to get the effect of `NonZero<*const _>`.
/// `ptr` doesn’t really have the 'static lifetime!
ptr: &'static (),
/// * If `borrowed_len_or_max == usize::MAX`, then `ptr` represents `NonZero<*const String>`
/// from `Rc::into_raw`.
/// The lifetime parameter `'a` is irrelevant in this case.
///
/// * Otherwise, `ptr` represents the `NonZero<*const u8>` data component of `&'a str`,
/// and `borrowed_len_or_max` its length.
borrowed_len_or_max: usize,
phantom: PhantomData<Result<&'a str, Rc<String>>>,
}
fn _static_assert_same_size<'a>() {
// "Instantiate" the generic function without calling it.
let _ = mem::transmute::<CowRcStr<'a>, Option<CowRcStr<'a>>>;
}
impl<'a> From<&'a str> for CowRcStr<'a> {
#[inline]
fn from(s: &'a str) -> Self {
let len = s.len();
assert!(len < usize::MAX);
CowRcStr {
ptr: unsafe { &*(s.as_ptr() as *const ()) },
borrowed_len_or_max: len,
phantom: PhantomData,
}
}
}
impl<'a> From<String> for CowRcStr<'a> {
#[inline]
fn from(s: String) -> Self {
CowRcStr::from_rc(Rc::new(s))
}
}
impl<'a> CowRcStr<'a> {
#[inline]
fn from_rc(s: Rc<String>) -> Self {
let ptr = unsafe { &*(Rc::into_raw(s) as *const ()) };
CowRcStr {
ptr: ptr,
borrowed_len_or_max: usize::MAX,
phantom: PhantomData,
}
}
#[inline]
fn unpack(&self) -> Result<&'a str, *const String> {
if self.borrowed_len_or_max == usize::MAX {
Err(self.ptr as *const () as *const String)
} else {
unsafe {
Ok(str::from_utf8_unchecked(slice::from_raw_parts(
self.ptr as *const () as *const u8,
self.borrowed_len_or_max,
)))
}
}
}
/// Convert into `String`, re-using an existing memory allocation if possible.
#[inline]
pub fn into_owned(self) -> String {
let unpacked = self.unpack();
// Inhibit destructor: we’re taking ownership of this strong reference (if any)
mem::forget(self);
match unpacked {
Ok(s) => s.to_owned(),
Err(ptr) => {
let rc = unsafe {
Rc::from_raw(ptr)
};
Rc::try_unwrap(rc).unwrap_or_else(|rc| (*rc).clone())
}
}
}
}
impl<'a> Clone for CowRcStr<'a> {
#[inline]
fn clone(&self) -> Self {
match self.unpack() {
Err(ptr) => {
let rc = unsafe {
Rc::from_raw(ptr)
};
let new_rc = rc.clone();
mem::forget(rc); // Don’t actually take ownership of this strong reference
CowRcStr::from_rc(new_rc)
}
Ok(_) => {
CowRcStr { ..*self }
}
}
}
}
impl<'a> Drop for CowRcStr<'a> {
#[inline]
fn drop(&mut self) {
if let Err(ptr) = self.unpack() {
mem::drop(unsafe {
Rc::from_raw(ptr)
})
}
}
}
impl<'a> Deref for CowRcStr<'a> {
type Target = str;
#[inline]
fn deref(&self) -> &str {
self.unpack().unwrap_or_else(|ptr| unsafe {
&**ptr
})
}
}
// Omitted: a bunch of trivial impls of standard library traits, deferring to str impls
Nice, thanks! I just feel like playing a bit, so I'm gonna try to clean it up a bit.
@SimonSapin I think doing this automatically may be difficult because you are effectively changing the type by removing a level of pointer indirection only in some highly specific cases.
It feels like this would be on the order of struct field reordering in terms of the kinds of weird bugs it'd end up with and the effort required, with not nearly as much impact.
A more general optimization here would be to realize that the refcount of Rc
is never zero and that the first variant never extends to cover it, but this relies on Rc
not being reordered which isn't guaranteed anymore.
(FWIW the refcount is not stored in Rc<T>
directly, it’s in the heap-allocated RcBox<T>
that Rc<T>
points to.)
There are several things we currently don't do that we could. It's not clear that these would have practical effects for any real program (or wouldn't negatively affect real programs by making llvm super confused), so all I can say is that these would be super cool.
Use Undefined Values in Other Primitives
Use Multiple Invalid-Value Fields
(&u8, &u8)
can be the same size asOption<Option<(&u8, &u8)>>
Support More than a Single Bit
Use all other fields as free bits when there exists another forbidden field
(&u8, u64)
supports 2^64 variants via the encoding(0, x)
Use the Fact that Void is Statically Unreachable
enum Void { }
cannot be instantiated, so any variant the containsVoid
is statically unreachable, and therefore can be removed. SoOption<Void>
can only beNone
, and therefore can be zero-sized.Support Enums that have More than One Non-C-Like Variant
Can be represented without any tag as
(&u8, *const u8)
via the following packing:Treat any True Discriminant Tags as a Type with Undefined Values
Option<Option<u32>>
can be "only" 8 bytes if the second Option stores its discriminant in the first Option's u32 tag.