rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.9k stars 1.56k forks source link

More Exotic Enum Layout Optimizations #1230

Closed Gankra closed 6 years ago

Gankra commented 9 years ago

There are several things we currently don't do that we could. It's not clear that these would have practical effects for any real program (or wouldn't negatively affect real programs by making llvm super confused), so all I can say is that these would be super cool.

Use Undefined Values in Other Primitives

(&u8, &u8) can be the same size as Option<Option<(&u8, &u8)>>

Support More than a Single Bit

(&u8, u64) supports 2^64 variants via the encoding (0, x)

Use the Fact that Void is Statically Unreachable

enum Void { } cannot be instantiated, so any variant the contains Void is statically unreachable, and therefore can be removed. So Option<Void> can only be None, and therefore can be zero-sized.

Support Enums that have More than One Non-C-Like Variant

enum Foo {
  A(&u8, &u8),
  B(&u8),
}

Can be represented without any tag as (&u8, *const u8) via the following packing:

Option<Option<u32>> can be "only" 8 bytes if the second Option stores its discriminant in the first Option's u32 tag.

Kimundi commented 9 years ago

:+1: to any of these, even if just as an experiment to see if it would be worth it in practice.

bluss commented 9 years ago

rust issue: rust-lang/rust#5977

bluss commented 9 years ago

Be wary of optimizations that are incompatible with pointers to the payload.

For example

let mut x: Option<Option<u32>> = Some(None);
let y: &mut Option<u32> = x.as_mut().unwrap();

y is a reference to just an Option<u32>, so we can't fudge its discriminant from the outer layer.

Gankra commented 9 years ago

Option<u32> is isomorphic to (bool, u32) which should be null-pointer optimizable (well - 2'd pointer optimizable).

bluss commented 9 years ago

(bool, u32) is a nice way to think about it, it should work.

Diggsey commented 9 years ago

A general-purpose "ForbiddenValues" wrapper as you described in rust-lang/rust#27573 would be nice, but how exactly would you tell the compiler which values are forbidden? Even just for primitive types it's difficult, considering that as yet the type system has no support for variadics or constant parameters.

Supporting compound types is even trickier. To be fully general, you have to be able to describe any possible subset of any possible compound type, and do so in a concise way which the compiler can easily reason about, and do it all without relying on any particular layout choice. Short of using a compiler plugin, I have no idea how you do that.

Basically: maybe it's worth stabilizing simple cases like NonZero now - it may well be that in future it will simply be a type alias to a ForbiddenValues type, but that should be a backwards compatible change.

rkjnsn commented 9 years ago

(&u8, &u8) can support 3 variants

It seems like one could also null the first field and use the second as a tag, allowing 2^64 (or 2^32) variants.

Gankra commented 9 years ago

@rkjnsn Oh yeah, great point!

Kimundi commented 9 years ago

Another idea: Make use of padding bytes in nested enums:

enum A {
    Aa(u32, B),
    Ab(u32, B),
}
enum B {
    Ba(u32),
    Bb(u32)
}

Today, this leads to this kind of memory layout:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                                         [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [  Apadding  ] Atag [       u32       ] [  Bpadding  ] Btag [       u32       ]

Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A tag in the B padding area:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                     [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [       u32       ] [ ABpad ] Atag Btag [       u32       ]

The idea here being that a B value never reads or writes its padding area, so its in principle free to be used by outer enums tags.

This would require that the compiler ensures that padding in a enum doesn't get overwritten with junk bytes though, which might be tricky or inefficient in combination with mutable references to such an enum (You couldn't just overwrite all bytes on reassignment).

rustonaut commented 9 years ago

You could also use the bit0/1 in pointers witch are always zero under some aligment conditions like if bit0 is 1 then it's not a pointer and bit7-1 are the actual tag. This is a bit more efficient then just having a null pointer as additional marker. (e.g. in the (&u8, &u8) case)

solson commented 9 years ago

@naicode That may be more space efficient, but it's less time efficient because of the bit shifting and masking required. Rust probably shouldn't make that kind of change automatically.

It's a neat idea, though. Reminds me of Ruby's internal VALUE type which if the LSB is 0 then it's a pointer to an object and if the LSB is 1 then the rest of the bits encode an immediate value like true, false, nil, a symbol, or an integer (there is a further way to differentiate those). So Ruby values are like a super-compact Rust enum in a way.

rustonaut commented 9 years ago

It's true that such bit-fiddling slows down the program at runtime and therefor should never be applied implicitly.

If more exotic layout optimisations are implemented it might still be possible to opt-in to such optimisations on per-struct-basis e.g. a #[repr(compact_rust)] Attribute witch uses rust representation + space optimisations witch have a runtime cost.

(e.g. for usage of rust in embedded environment with small sized RAM, or a enum's allocated in massive amounts on the heap).

huonw commented 9 years ago

cc https://github.com/rust-lang/rust/issues/14540

mbrubeck commented 8 years ago

cc rust-lang/rust#24041

soltanmm commented 8 years ago

Another idea: compress nested enum variant ids (thanks Mutabah over IRC!).

enum A { A1, A2 }
enum B { BA(A), B1, B2 }

This currently requires space in B for the variant id of A, but one could assign BA+A1 = 0, BA+A2 = 1, B1 = 2, B2 = 3, and both save space and maintain the ability to take a reference to the stored A (because its payload may presumably be stored immediately following its variant id [which is also B's] as usual).

Diggsey commented 8 years ago

@soltanmm Those kind of optimizations are not generally possible, because 'A' must have the same layout regardless of whether it is used within another enum (otherwise references to the 'A' within 'BA' would have a different layout from normal references to 'A'.

Furthermore, you can't adjust the layout of all 'A's in advance because 'B' might be in a different crate.

What you can do is treat the determinant in 'A' as a restricted value (either 0 or 1 for A1/A2) in which case the 'B' is free to use the disallowed values as IDs for the other variants (B1, B2) but that kind of optimization has already been covered in this issue.

huonw commented 8 years ago

@Diggsey note that @soltanmm's suggestion doesn't adjust the layout of A (in both standalone and optimised form A1 is 0, and A2 is 1, although the particular values don't matter). AFAICT, your last paragraph is exactly what @soltanmm is saying.

soltanmm commented 8 years ago

@huonw EDIT: I just realized I might have interpreted that incorrectly. Almost exactly, I think, but not quite (specifically the bit about this being already covered).

@Diggsey I believe the optimizations discussed most similar were using forbidden values in payloads or placing determinants in padding, as opposed to outright summing the determinants in the same location. Having the determinant be a value for which forbidden values might be used wasn't mentioned, and this has slightly different space-time trade-offs to @Kimundi's example with the determinants being split into different sequences of bit positions.

Arguably, though, everything in this discussion is some variant of, "Use forbidden values to encode variants," so as long as everyone's kicking the horse I guess I concede that to @Diggsey. :-)

arthurprs commented 7 years ago

Would it be possible to enable NonZero for the common Rust enum? Like

enum Status {
    Alive, // discriminant would start at 1 instead of 0
    Suspect,
    Dead,
}

size_of::<Status>() == size_of::<Option<Status>>()
eddyb commented 7 years ago

@arthurprs No, that's not a good choice, first off you can't depend on knowing whether Option<Status> is used in the program, and secondly what if I have Option<Result<Status, ()>>? The general niche optimization would use values above those of the original enum, which scales.

solson commented 7 years ago

The general optimization scales better, but @arthurprs's suggestion seems like it would technically work fine. You don't need to depend on knowing about Option<Status>, you can just always start from 1, unless we have existing reasons to prefer starting from 0.

arthurprs commented 7 years ago

Rust rely so much on these patterns that any gain would be huge.

Option<Result<Status, ()>>

Can't it be the usual 2 discriminants + Status?

cuviper commented 7 years ago

If you manually wrote enum Status { Alive=1, ... }, could it decide this is NonZero? But generally speaking, I would agree that broader Forbidden Value analysis is more promising.

petrochenkov commented 7 years ago

I'm surprised enums like

enum E {
    A = 1,
    B = 2,
    C = 3,
}

don't use non-zero optimization right now. If you need to save space you have to add an artificial None-like variant instead of using idiomatic Option<E>.

eddyb commented 7 years ago

From the reference:

If a discriminant isn't specified, they start at zero, and add one for each variant, in order.

So it'd have to not be visible to the user, which means doing a bunch of operations to hide it.

I'm surprised enums like ... don't use non-zero optimization right now.

@cuviper @petrochenkov That one's an easy fix, compared to the others, likely just a couple extra lines.

The problem we have is there aren't enough people to work on improvements like these, once they become possible. There's much bigger fish that have more serious implications and we have to pick. Worse, every time we compromise, cleaning it up later becomes harder, which slows us down further.

@camlorn has recently cleaned up the duplication in rustc_trans and he might be interested in taking a look at niche optimizations, or I can also mentor someone else too. (Find me in #rustc on IRC)

arthurprs commented 7 years ago

From the reference:

If a discriminant isn't specified, they start at zero, and add one for each variant, in order.

:cry:

But this only affects C-Style enums. The others (vast majority) still has potential for improvement.

ahicks92 commented 7 years ago

I might take some of these after I finish my current work to optimize field order for anything represented as Univariant, probably from most impact to least.

Making enums which fall into the CEnum case of Layout without being explicitly C enums start at 1 should be an incredibly tiny change.

Stebalien commented 7 years ago

But this only affects C-Style enums. The others (vast majority) still has potential for improvement.

Is there any reason None couldn't be defined to be 3? That seems less hacky. When you have generic enums wrapping generic enums, etc. rustc can just reserve discriminant ranges as needed. For example, None: Option<Option<E>> could be 4:

E::A: E == 0;
E::B: E == 1;
E::C: E == 2;
Some(E::A): Option<E> == 0;
Some(E::B): Option<E> == 1;
Some(E::C): Option<E> == 2;
None: Option<E> = 3;
Some(Some(E::A)): Option<Option<E>> == 0;
Some(Some(E::B)): Option<Option<E>> == 1;
Some(Some(E::C)): Option<Option<E>> == 2;
Some(None): Option<Option<E>> == 3;
None: Option<Option<E>> == 4;
solson commented 7 years ago

@Stebalien That is the optimization @eddyb was referring to, which places values above the original enum's values. It's what I think we should ultimately go with, as well.

Stebalien commented 7 years ago

@solson Ah. Sorry, I missed that when skimming his reply.

ahicks92 commented 7 years ago

I spoke to @eddyb on IRC and we discussed algorithms for these cases. The conclusion is that it's not actually hard now that repr is deduplicated. We think nothing outside layout has to change. I'm still on the fence, but am leaning towards trying unless someone else wants to take it.

I tend to think of things at like 2 in the morning. The thing I thought of at like 2 in the morning last night is how much optimizing a tree of types to use one discriminant could benefit futures-rs.

mrhota commented 7 years ago

CC rust-lang/rust#36237 (a rebase of rust-lang/rust#31215): an attempt to optimize two special cases of Option.

Indeed, one of @eddyb's comments on that issue echos discussion here.

eddyb commented 7 years ago

@mrhota See also this more recent comment - while I realized later that it's not exactly on-topic in that thread, I'm stating the situation as I see it - rust-lang/rust#39999 followed by some refactors is necessary.

aidanhs commented 7 years ago

Found myself wanting this for creating a small vector of min size 2, e.g.

enum SmallVec<'a> {
    A2(&'a u64, &'a u64),
    V(Vec<&'a u64>),
}

(since the first member of a Vec is the Unique pointer, i.e. non-null). Though this example seems like a clear win, it's less clear what rust should do if A0 and A1 variants were added, since doing the optimisations make it harder to figure out what variant the enum is.

KamilaBorowska commented 7 years ago

Null pointer optimization in two variants case could be used even if both variants contain values. For example Result<(&i32, usize), usize> can be stored like (Option<&i32>, usize)

ahicks92 commented 7 years ago

@xfix I was going to say that I don't think this can be done easily, then I started thinking of ways we might be able to do so. So maybe.

But I do think it wouldn't be trivial.

ahicks92 commented 7 years ago

Also. That's really only applicable in a small number of cases. Let (a, n, b) denote a type with a segment of fields (a) before a non-null pointer (n) followed by a segment of fields (b).

We can only do the optimization if the other variant has either sizeof(other) <= sizeof(b) or sizeof(other) <= sizeof(a). in any other case, the enum itself becomes larger, which we don't want.

It may be possible to tweak the layout algorithms to make this work reliably by always putting references at the beginning of structs--we sort of already do.

SimonSapin commented 7 years ago

Today I am writing unsafe code to represent as (*const u8, usize, PhantomData<&'a str>) (on two words of memory) a type that is effectively:

pub enum BorrowedOrSharedString<'a> {
    Borrowed(&'a str)
    Shared(Rc<String>),
}

The compiler could do that optimization: represent Borrowed as &'a str’s (&'a u8, usize) directly (with a non-null pointer), and Shared as (null, Rc<String>)

Kixunil commented 7 years ago

@SimonSapin Nice! Did you publish it as separate crate? Seems very similar to Cow and I think it'd be good to enable code reuse. Similarly with Arc.

SimonSapin commented 7 years ago

It’s not done yet, and I’m not sure I’ll bother with a separate crate. It is somewhat similar to Cow in the dual representation, but care much about the copy-on-write aspect.

SimonSapin commented 7 years ago

As to code reuse, I feel that every time I want something like this I want something slightly different.

Kixunil commented 7 years ago

but care much about the copy-on-write aspect.

What exactly do you mean by that?

Could you at least throw raw code in the wild, so someone could take it and create the crate?

SimonSapin commented 7 years ago

I mean that I haven’t implemented DerefMut, make_mut, to_mut, or anything that switches from one variant to the other. The code is in a work-in-progress branch, I’ll link it here when I open a PR.

eira-fransham commented 7 years ago

I haven't read the whole thread here but I did attempt to grep through to see if this has been mentioned so far - can we store way more tags in a pointer because of its alignment? We know for sure that a pointer of 2-byte alignment can never be odd, so we can store one tag in P=0 and then P=1, P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) * 2.pow(size_of::<usize>() - (n - 1)) possible tags, which should mean that we can have zero tag overhead for almost any pointer-containing enum (since how often do you have a pointer to a single-byte-alignment datatype).

I should say that I didn't come up with this myself, I stole it from a really clever trick in the bytes crate.

rustonaut commented 7 years ago

@Jef while this reduces memory consummation it increases CPU consummation as accessing the discernment requires bit making operations. It's also havily dependant on the alignment a point has for a given type on a given platform.

I think optimizations treating CPU cyclers for memory should not necessarily be done without any way for the programmer to control them. Maybe something like #[optimize(<what>)] with what being e.g. size or cpu/access_time ... might be interesting. Note that optimize size is still different form packed/compact which might create unaligned fields (as it does not use any padding at all) but might be used to determine which of multiple fields from which one has to be unaligned is made unaligned.

On Tue, Jul 4, 2017, 12:35 Jef notifications@github.com wrote:

I haven't read the whole thread here but I did attempt to grep through to see if this has been mentioned so far - can we store way more tags in a pointer because of its alignment? We know for sure that a pointer of 2-byte alignment can never be odd, so we can store one tag in P=0 and then P=1, P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) * 2.pow(size_of::() - (n - 1)) possible tags, which should mean that we can have zero tag overhead for almost any pointer-containing enum (since how often do you have a pointer to a single-byte-alignment datatype).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rfcs/issues/1230#issuecomment-312845071, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR0kW4avRn6xb0SCAP3-VNEsNYOUU-Oks5sKhWCgaJpZM4FjSb8 .

eira-fransham commented 7 years ago

@dathinab Accessing the discriminant would never require bit masking, although accessing the pointer may require bit masking unless you're smart about it. In cases where bit masking would be necessary I would say that yes, that should be explicit opt-in, but it's by no means always the case. For example:

enum ContainsPointer {
    A(&Foo), // Assume `Foo` has 2-byte alignment
    B,
    C,
}

match contains_pointer {
    A(thing) => { ... },
    B => { ... },
    C => { ... },
}

Could be converted to:

struct ContainsPointer(*const Foo);

if contains_pointer.0 == 0 {
    // B's path
} else if contains_pointer.0 == 1 {
    // C's path
} else {
    // A's path
    // We know that any value that is not 0 or 1 is valid since we control what gets written
    // to a value of type `ContainsPointer`
}
SimonSapin commented 7 years ago

Could you at least throw raw code in the wild, so someone could take it and create the crate?

Here it is.

/// A string that is either shared (heap-allocated and reference-counted) or borrowed.
///
/// Equivalent to `enum { Borrowed(&'a str), Shared(Rc<String>) }`, but stored more compactly.
///
/// FIXME(https://github.com/rust-lang/rfcs/issues/1230): use an actual enum if/when
/// the compiler can do this layout optimization.
pub struct CowRcStr<'a> {
    /// FIXME: https://github.com/rust-lang/rust/issues/27730 use NonZero or Shared.
    /// In the meantime we abuse `&'static _` to get the effect of `NonZero<*const _>`.
    /// `ptr` doesn’t really have the 'static lifetime!
    ptr: &'static (),

    /// * If `borrowed_len_or_max == usize::MAX`, then `ptr` represents `NonZero<*const String>`
    ///   from `Rc::into_raw`.
    ///   The lifetime parameter `'a` is irrelevant in this case.
    ///
    /// * Otherwise, `ptr` represents the `NonZero<*const u8>` data component of `&'a str`,
    ///   and `borrowed_len_or_max` its length.
    borrowed_len_or_max: usize,

    phantom: PhantomData<Result<&'a str, Rc<String>>>,
}

fn _static_assert_same_size<'a>() {
    // "Instantiate" the generic function without calling it.
    let _ = mem::transmute::<CowRcStr<'a>, Option<CowRcStr<'a>>>;
}

impl<'a> From<&'a str> for CowRcStr<'a> {
    #[inline]
    fn from(s: &'a str) -> Self {
        let len = s.len();
        assert!(len < usize::MAX);
        CowRcStr {
            ptr: unsafe { &*(s.as_ptr() as *const ()) },
            borrowed_len_or_max: len,
            phantom: PhantomData,
        }
    }
}

impl<'a> From<String> for CowRcStr<'a> {
    #[inline]
    fn from(s: String) -> Self {
        CowRcStr::from_rc(Rc::new(s))
    }
}

impl<'a> CowRcStr<'a> {
    #[inline]
    fn from_rc(s: Rc<String>) -> Self {
        let ptr = unsafe { &*(Rc::into_raw(s) as *const ()) };
        CowRcStr {
            ptr: ptr,
            borrowed_len_or_max: usize::MAX,
            phantom: PhantomData,
        }
    }

    #[inline]
    fn unpack(&self) -> Result<&'a str, *const String> {
        if self.borrowed_len_or_max == usize::MAX {
            Err(self.ptr as *const () as *const String)
        } else {
            unsafe {
                Ok(str::from_utf8_unchecked(slice::from_raw_parts(
                    self.ptr as *const () as *const u8,
                    self.borrowed_len_or_max,
                )))
            }
        }
    }

    /// Convert into `String`, re-using an existing memory allocation if possible.
    #[inline]
    pub fn into_owned(self) -> String {
        let unpacked = self.unpack();

        // Inhibit destructor: we’re taking ownership of this strong reference (if any)
        mem::forget(self);

        match unpacked {
            Ok(s) => s.to_owned(),
            Err(ptr) => {
                let rc = unsafe {
                    Rc::from_raw(ptr)
                };
                Rc::try_unwrap(rc).unwrap_or_else(|rc| (*rc).clone())
            }
        }
    }
}

impl<'a> Clone for CowRcStr<'a> {
    #[inline]
    fn clone(&self) -> Self {
        match self.unpack() {
            Err(ptr) => {
                let rc = unsafe {
                    Rc::from_raw(ptr)
                };
                let new_rc = rc.clone();
                mem::forget(rc);  // Don’t actually take ownership of this strong reference
                CowRcStr::from_rc(new_rc)
            }
            Ok(_) => {
                CowRcStr { ..*self }
            }
        }
    }
}

impl<'a> Drop for CowRcStr<'a> {
    #[inline]
    fn drop(&mut self) {
        if let Err(ptr) = self.unpack() {
            mem::drop(unsafe {
                Rc::from_raw(ptr)
            })
        }
    }
}

impl<'a> Deref for CowRcStr<'a> {
    type Target = str;

    #[inline]
    fn deref(&self) -> &str {
        self.unpack().unwrap_or_else(|ptr| unsafe {
            &**ptr
        })
    }
}

// Omitted: a bunch of trivial impls of standard library traits, deferring to str impls
Kixunil commented 7 years ago

Nice, thanks! I just feel like playing a bit, so I'm gonna try to clean it up a bit.

ahicks92 commented 7 years ago

@SimonSapin I think doing this automatically may be difficult because you are effectively changing the type by removing a level of pointer indirection only in some highly specific cases.

It feels like this would be on the order of struct field reordering in terms of the kinds of weird bugs it'd end up with and the effort required, with not nearly as much impact.

A more general optimization here would be to realize that the refcount of Rc is never zero and that the first variant never extends to cover it, but this relies on Rc not being reordered which isn't guaranteed anymore.

SimonSapin commented 7 years ago

(FWIW the refcount is not stored in Rc<T> directly, it’s in the heap-allocated RcBox<T> that Rc<T> points to.)