rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.97k stars 1.57k forks source link

Anonymous sum types #294

Open rust-highfive opened 10 years ago

rust-highfive commented 10 years ago

Issue by glaebhoerl Saturday Aug 03, 2013 at 23:58 GMT

For earlier discussion, see https://github.com/rust-lang/rust/issues/8277

This issue was labelled with: B-RFC in the Rust repository


Rust has an anonymous form of product types (structs), namely tuples, but not sum types (enums). One reason is that it's not obvious what syntax they could use, especially their variants. The first variant of an anonymous sum type with three variants needs to be syntactically distinct not just from the second and third variant of the same type, but also from the first variant of all other anonymous sum types with different numbers of variants.

Here's an idea I think is decent:

A type would look like this: (~str|int|int). In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and).

A value would have the same shape (also like tuples), with a value of appropriate type in one of the "slots" and nothing in the rest:

let foo: (~str|int|int) = (!|!|666);
match foo {
    (s|!|!) => println(fmt!("string in first: %?", s)),
    (!|n|!) => println(fmt!("int in second: %?", n)),
    (!|!|m) => println(fmt!("int in third: %?", m))
} 

(Nothing is a bikeshed, other possible colors for it include whitespace, ., and -. _ means something is there we're just not giving it a name, so it's not suitable for "nothing is there". ! has nothing-connotations from the negation operator and the return type of functions that don't.)

I'm not sure whether this conflicts syntax-wise with closures and/or negation.

Another necessary condition for this should be demand for it. This ticket is to keep a record of the idea, in case someone else has demand but not syntax. (If the Bikesheds section of the wiki is a better place, I'm happy to move it.)

SEE ALSO

huonw commented 9 years ago

cc #402, #514, #1154

ticki commented 8 years ago

What's the state of this?

Rufflewind commented 7 years ago

Compared to tuples, anonymous enums would become increasingly tedious to use since a match statement would have N^2 pipe (|) characters. At the expense of type inference, it may be better to go with a syntax like:

let foo: enum(String, int, int) = enum 2(666);
match foo {
    enum 0(s) => println!("string in first: {:?}", s),
    enum 1(n) => println!("int in second: {:?}", n),
    enum 2(m) => println!("int in third: {:?}", m),
}

The syntax would be compatible with a future extension that allows enums to be declared with named choices:

let foo: enum { Red(String), Green(int), Blue(int) } = enum Blue(666);
match foo {
    enum Red(s) => println!("string in first: {:?}", s),
    enum Green(n) => println!("int in second: {:?}", n),
    enum Blue(m) => println!("int in third: {:?}", m),
}
eddyb commented 7 years ago

I think the feature would be more useful without allowing matching, just doing trait dispatch. I guess it's a different feature, where T|T has T's representation, as opposed to one bit more.

plietar commented 7 years ago

@eddyb I've been putting some thoughts into a feature like that I've posted about it on irlo : https://internals.rust-lang.org/t/pre-rfc-anonymous-enum-which-automatically-implement-forwarding-traits/4806

burdges commented 7 years ago

I'd think an Alloca<Trait> analog of Box<Trait> would provide the same functionality as this return enum expr extension of -> impl Trait idea, except there is dynamic dispatch in Alloca<Trait> so optimization suffers.

OvermindDL1 commented 7 years ago

Passing by, but if you are curious in syntax's then OCaml has anonymous sum types called Polymorphic Variants. Basically they are just a name, like `Blah, which can have optional values. An example of the syntax:

# let three = `Int 3;;
val three : [> `Int of int ] = `Int 3
# let four = `Float 4.;;
val four : [> `Float of float ] = `Float 4.
# let nan = `Not_a_number;;
val nan : [> `Not_a_number ] = `Not_a_number
# let list = [three; four; nan];;
val list  : [> `Float of float | `Int of int | `Not_a_number ] list

The val lines are the types of the let assignments, left in to see how the typing works.

In the back-end at assembly time the names are given a globally unique integer (in the current implementation it is via hashing, a chance of collision but overall the chance is extremely low as well as warnings can be put in place to catch them), however I've seen talk of making a global registry so they just get incremented on first access efficiently.

A plain Polymorphic Variant with no data is represented internally as an integer:

`Blah

Becomes the integer 737303889 (yes I checked), and comparing those are trivial. For Polymorphic variants that can hold data (either a single element or a tuple of elements) such as:

`Blah (42, 6.28)

Gets encoded internally as an array of two fields in assembly, the first is the above number as before, the second is the pointer to the data of the tuple (although in most cases these all get inlined into the same memory in OCaml due to inlining and optimization passes). In the typing system the above would be [>Blah of int float ](in OCaml the types of a tuple are separated by`).

However, about Polymorphic variants is that they can be opened or closed. Any system can pass any of them that they want, including passing through if you want. For example, a simple way to handle something like a generic event in OCaml would be like:

let f next x = match x with
  | `Blah x -> do_something_with x
  | `Foobar -> do_something_else ()
  | unhandled -> next unhandled

Which is entirely type safe, dependent on what each function handles down the chain and all.

The big thing on the typing system is that things can be open or close typed, I.E. they either accept any amount of Polymorphic Variants or a closed set of Polymorphic Variants. If something like anonymous sum type here were to be accepted then that concept would be exceedingly useful while being very easy and very fast to statically type.

burdges commented 7 years ago

Anonymous sum types might interact with -> impl Trait : At present, this code snippet cannot compile because the iterators have different types :

match x {
    A(a) => once(a).chain(foo),
    B(b) => once(bar).chain(foo).chain(b),
}

You could make this make sense with an anonymous sum type of the form impl Iterator | impl Iterator, that itself becomes an Iterator, but inferring any type like that sounds like chaos.

One could do it in std with enums like :

enum TwoIterators<A,B> {
    IterA(A),
    IterB(B),
}

impl Iterator for TwoIterators where .. { .. }

so the above code becomes

match x {
    A(a) => TwoIterators::IterA( once(a).chain(foo) ),
    B(b) => TwoIterators::IterB( once(bar).chain(foo).chain(b) ),
}

I could imagine some enum Trait sugar that did basically this too. You cannot delegate associated types or constants to an enum at runtime like this, so an enum Trait must enforce that they all agree across all the variants.

dobkeratops commented 7 years ago

this might sound like a weird hack , but how about just making A|B sugar for 'Either', i suppose it might get even weirder to start composing A|B|C as Either<A,Either<B,C>> or have that mapping to something . What if there was some sort of general purpose 'operator overloading' in the 'type syntax' , allowing people code to experiment with various possibilities - see what gains traction (i had yet another suggestion about allowing general purpose substitutions, e.g. type Either<A,Either<B,C>> = Any3<A,B,C> .. etc https://www.reddit.com/r/rust/comments/6n53oa/type_substitutions_specialization_idea/ now imagine recovering ~T === Box ~[T] ... type Box<RawSlice> = Vec .. through a completely general purpose means )

strega-nil commented 7 years ago

@dobkeratops I'd rather just have a variant style type, i.e., with variadics.

Sgeo commented 7 years ago

I wrote some code that could potentially fit into a library now that type macros are stable: https://gist.github.com/Sgeo/ecee21895815fb2066e3

Would people be interested in this as a crate?

Ekleog commented 6 years ago

I've just come upon this issue, while looking for a way to avoid having some gross code that simply doesn't want to go away (actually it's slowly increasing, started at 8 variants and passed by 9 before reaching 12):

use tokio::prelude::*;

pub enum FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
    F1: Future<Item = T, Error = E>, // ...
{
    Fut1(F1), // ...
}

impl<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12> Future
    for FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
    F1: Future<Item = T, Error = E>, // ...
{
    type Item = T;
    type Error = E;

    fn poll(&mut self) -> Result<Async<Self::Item>, Self::Error> {
        use FutIn12::*;
        match *self {
            Fut1(ref mut f) => f.poll(), // ...
        }
    }
}

I was thus thinking that it'd be great to have anonymous sum types that automatically derived the traits shared by all their variants, so that I could get rid of this code and just have my -> impl Future<Item = (), Error = ()> function return the futures in its various branches (with some syntax, that ideally wouldn't expose the total number of variants but let rustc infer it from the returning points, so that adding a branch doesn't require changing all the other return points), and have the anonymous sum type match the -> impl Future return type.

glaebhoerl commented 6 years ago

As I wrote here I think this use case would be better addressed by something modelled after how closures work.

alexreg commented 6 years ago

I don’t think it would be wise to make anonymous sum types nominally typed, as you seem to suggest. Structural typing, as with tuples, is far more useful and less surprising to the programmer.

Pauan commented 6 years ago

@alexreg What they're saying is that the specific use-case of wanting to return impl Trait with different types in each branch is better handled by a secret nominal type, similar to how closures are implemented.

Therefore, anonymous sum types are separate (and mostly unrelated) from that use case.

alexreg commented 6 years ago

@Pauan Oh, well I agree with that. As long as we consider these things two separate features, fine. Thanks for clarifying.

Ekleog commented 6 years ago

Oh indeed good point, thanks! Just opened #2414 to track this separate feature, as I wasn't able to find any other open issue|RFC for it :)

eaglgenes101 commented 6 years ago

I'm planning to get out a pull request for this proposed RFC. Most of you following this thread probably know that a number of proposals like this were rejected for being too complex, so its focus is minimalism and implementation simplicity rather than ergonomics and features. Any words before I get it out? (I've asked this question in multiple other areas to try to collect as much feedback before getting the proposed RFC out, fyi)

https://internals.rust-lang.org/t/pre-rfc-anonymous-variant-types/8707/76

vadixidav commented 5 years ago

I am not sure where the appropriate place is at this point to suggest solutions to this problem, but one thing that was mentioned was interaction with impl Trait. Perhaps an anonymous enum could be created of all returned things so long as they implement some trait. For instance (the ... are left to your imagination):

fn foo() -> Result<(), impl Error> {
..
return Err(fmt::Error...);
...
return Err(io::Error...);
...
return Ok(());
}

This would make an implicit anonymous enum/sum type that implements Error. This would greatly help the current situation with Rust error handling.

Edit: I can also write up a pre-rfc with this if it seems workable.

Ixrec commented 5 years ago

@vadixidav Ideas like that have also been floating around for years under names like enum impl Trait. For example:

It's generally considered a separate feature proposal, since an enum impl Trait would be something you cannot match on, so there would be no need for any special syntax for the concrete types or their values, but it would only apply to function returns. An "anonymous sum type" is usually taken to mean something that can be created and used anywhere, would be something you explicitly match on, and thus requires adding some special syntax for the concrete types and values.

vadixidav commented 5 years ago

@alexreg Got it. I will direct my focus to places where that feature is being proposed instead. Thank you for the pointer.

Neo-Ciber94 commented 3 years ago

I like this feature, this is like TypeScript union types https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html

Will be interesting see auto generated enum on rust, I already like the TypeScript syntax type1 | type2 | ... or enum(type1, type2, ...)

fn add_one(mut value: String | i64) -> String | i64 {
   match value {
       x : String => { 
           x.push_str("1"); 
           x 
       }
       y : i64 => { y + 1 }
   }
}
johannbuscail commented 3 years ago

Any update on this ?

Luca-spopo commented 3 years ago

Would this also be useful for coalasing errors in Result chains?

trySomething() //Result<A, E1>
.and__then(trySomethingElse) //Result<B, E1|E2>
.and__then(tryYetAnotherThing) //Result<C, E1|E2|E3>
yoshuawuyts commented 2 years ago

Hey all, I wrote a post about this topic today: https://blog.yoshuawuyts.com/more-enum-types/. In particular I think it's interesting that if we compare structs and enums, it seems enums often take more work to define. Here's the summary table from the post:

Structs Enums Enums Fallback
Named struct Foo(.., ..) enum Foo { .., .. } -
Anonymous (.., ..) either crate
Type-Erased impl Trait auto_enums crate
petar-dambovaliev commented 2 years ago

auto_enums

I am working on a library to more or less do what you want, i think. It looks something like this

#[derive(Debug)]
struct Bar;

#[ano_enum]
fn foo() -> ano!(Bar | u8 | u64) {
    Bar
}

#[ano_enum]
fn bar1(foo: ano!(i8 | u8 | Bar)) {
    match ano!(foo) {
        foo::u8(n) => {println!("{}", n + 1)},
        foo::i8(n) => {println!("{}", n)},
        foo::Bar(n) => {println!("{:#?}", n)},
    }
}
Keavon commented 2 years ago

I like this feature, this is like TypeScript union types https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html

Will be interesting see auto generated enum on rust, I already like the TypeScript syntax type1 | type2 | ... or enum(type1, type2, ...)

fn add_one(mut value: String | i64) -> String | i64 {
   match value {
       x : String => { 
           x.push_str("1"); 
           x 
       }
       y : i64 => { y + 1 }
   }
}

I really like this syntax since it works much like TypeScript. Rust and TS are my main two languages, and union types is something I greatly miss in Rust. This is probably the #1 feature, in my book, which Rust lacks but needs. I hope this makes it into the language sooner than later.

Rudxain commented 2 years ago

About the comparison with TypeScript union types:

YES! I'm tired of having to guess traits, reading docs, or relying on an IDE, just to say that a fn works correctly for many input-arg types. I wish I could do something like:

const fn gcd(mut a: Int, mut b: Int) -> Int {
    while b != 0 {
        (a, b) = (b, a % b)
    }
    a.abs()
}

Where Int is a named union type comprising all fixed-size integers (signed, unsigned, usize, and isize)

scottmcm commented 2 years ago

I suspect most people wouldn't want the enum for that, since they don't want the enum for the return type, but rather they want it to return the type they put in (or maybe the unsigned variant thereof).

Perhaps you're looking for a generic method instead, something like https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=61aa7ed143dbd681b725bc24fbcd7516

use num_traits::*; // 0.2.15
fn gcd<Int: Signed + Copy>(mut a: Int, mut b: Int) -> Int {
    while b != Int::zero() {
        (a, b) = (b, a % b)
    }
    a.abs()
}
Rudxain commented 2 years ago

I suspect most people wouldn't want the enum for that, since they don't want the enum for the return type, but rather they want it to return the type they put in (or maybe the unsigned variant thereof).

True. But what I suggest isn't to return an enum per-se, but to return the primitive value directly, regardless of the type (as long as it is constrained).

Perhaps you're looking for a generic method instead, something like https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=61aa7ed143dbd681b725bc24fbcd7516

Thank you a lot! But I wish it was possible to specify types in fn signatures, without any kind of trait-constraints at all, using the types themselves as constraints, like so:

//custom keyword
typedef uint {
    u8, u16, u32, u64, u128, usize
}

const fn gcd(mut a: uint, mut b: uint) -> uint {
    while b != 0 {
        (a, b) = (b, a % b)
    }
    a
}

This way, we could define custom union types that "contain" (I couldn't think of a better term) arbitrary types, as long as the compiler proves that they are "compatible"

Jules-Bertholet commented 2 years ago

I have heard this concept referred to before as "static trait"; there was a brief discussion here. Very different concept from an enum, however

Rudxain commented 2 years ago

I have heard this concept referred to before as "static trait"; there was a brief discussion here.

Thank you for the link! I'll read it now

Very different concept from an enum

Definitely

DanteMarshal commented 2 years ago

What's the state of this ?

jhpratt commented 2 years ago

@DanteMarshal There is no open RFC for this feature, so there is no status to report.

sivizius commented 1 year ago

To add some points, that might have been mentioned in the last 10 years of discussion, but there are too many places this was discussed, so excuse me, that I do not have the full overview about everything discussed so far:

With the already stable core::any::TypeId::of and the recently stabilised enum Foo { A = 1, B(B} = 2, … }, we could simply desugar A | B | C to enum { A(A) = TypeId::of::<A>(), B(B) = TypeId::of::<B>(), C(C) = TypeId::of::<C>() }, as soon as TypeId::of is stabilised as const. A function returning such a type:

fn foo<T, U>(a: T, b: U, c: usize) -> usize | T | U | () | ! {
    match c {
        0 => panic!("Sometimes gonna give you up, sometimes gonna let you down."),
        1 => (),
        2 => a,
        3 => b,
        x => x,
    }
}

could now be desugared to, if we allow impl on fn and let:

fn foo<T, U>(a: T, b: U, c: usize) -> foo<T, U> {
    match c {
        0 => panic!("Sometimes gonna give you up, sometimes gonna let you down."),
        1 => (),
        2 => a,
        3 => b,
        x => x,
    }
}

impl<T,U> foo<T, U> {
    //  `T` and `U`, `T` and `usize`, `U` and `usize` or all of them might be equal.
    //  IMHO, `A | A` should be equal to `A`.
    //  Merging two variants with the same name, type and discriminant should be safe?
    //  IMHO, `A | B` should be equal to `B | A`, so by sorting them by discriminant,
    //    the information about the order is removed. 
    //  If the final definition of this enum only has one variant,
    //    this would already be optimised out iirc.
    type result<T, U> = enum {
        //  A normal type with a normal name.
        usize(usize) = TypeId::of::<usize>(),

        //  {T} and {U} means: The actual name of this type.
        {T}(T) = TypeId::of::<T>(),
        {U}(U) = TypeId::of::<U>(),

        //  This is not a real enum-decleration, but an internal one, so in addition to the unit type,
        //    tuples and array should be valid variants as well.
        ()() = TypeId::of::<()>(),

        //  The bottom type `!` and any other empty `enum` has no values,
        //    so we can basically just ignore it here.
    };

    //  However, IMHO it is beneficial to annotate functions, that might panic, with `… | !` in the return type.
    //  E.g. we might want to annotate a function with `#[panic_free]` which ensures,
    //    that inside this function, no function that might panic is called.
    //  By explicitly labeling functions with either `#[panic_free]` or `… | !`,
    //    we can prevent false positives/false negatives.
    const MIGHT_PANIC: bool = true;
}

An issue arises with literal values, e.g. fn bar(a: bool) -> u8 | i8 { if a { 1 } else { 2 } }. If a literal value could be coerced to more than one variant, the compilation must fail with a suggestion to type the literals (1u8, 2i8). if a { 128 } else { -2 } on the other hand would be valid, because the range of u8 is [0,255] and of i8 is [-128,127]. Obviously, the compilation must fail if a literal value cannot be coerced to any variant, even in unreachable code paths.

If one does not want A | A to be equal to A, one could declare struct A1 { inner: A } and struct A2 { inner: A }, even though this would add quite some overhead.

AFAIK, match already accepts all relevant patterns, but the compiler has to interpret <primitive>(x) as foo::result::<primitive>(x). Somehow, the compiler already does so for e.g. Option and Result like in:

match try_foo() {
    Some(x) => …,
    None => …,
}

instead of:

match try_foo() {
    Option::Some(x) => …,
    Option::None => …,
}

There is already an ongoing discussion to infer the base-enum-type (421, 2830, 3167, …), this would become quite useful here. Especially with non-alphanumeric names like unit, tuples, arrays, references and pointers:

match foo("Hello", [ 23u8, 42 ], random_integer) {
    &'a str(msg) => println!("Got string {:?}", msg),
    [ p, q ] => println!("Got array [ {:?}, {:?} ]", p, q),
    usize(x) => println!("Got string {:?}", x),
}

The last pattern usize(x) is somewhat unhandy, perhaps could allow ranges like this:

fn select(n: usize, a: u8, b: i8, c: bool, d: &'static [ char ], e: Option<bool>, f: Option<u8>) -> u8 | i8 | bool | &'static [ char ] | Option<bool> | Option<u8> | () {
    match n {
        0 => a,
        1 => b,
        2 => c,
        3 => d,
        4 => e,
        5 => f,
        _ => (),
    }
}

match select(random_integer, 200, -23, true, &[ 'H', 'e', 'l', 'l', 'o' ]) {
    x @ -128..=255 => println!("Got integer {:?}", x),
    y @ (true | false) => println!("Got bool {:?}", y),
    [ head, tail @ .. ] => println!("Got slice starting with {:?}", head),
    None => println!("Got nothing"),
    Some(inner) => println!("Got something {:?}", inner),
    other => println!("What does this match? {:?}", other),
}

However, what type has x? This line has to be duplicated with the cases i8(x @ -128..=127) and and u8(x @ 0..=255). And what about inner? Should the variants of named sum types be merged into the anonymous one? With just one Option-type, this seems to be reasonable, because None and Some are unambiguous, but not with multiple. Some(_) and Some(inner @ (true | false)) are unambiguous as well: The former just does not care and the latter enforces bool. There might also be two enums with the same variant-names, so merging named sum types with anonymous ones is dangerous.

Merging anonymous enums on the other hand is desirable, e.g.:

fn foo(…) -> bool | usize { … }
fn bar(…) -> usize | () { … }

//  Type of a is `foo::result | bar::result ` which is equal to `bool | usize | ()`
//    which would have the internal type name `enum a::result { … }`
let a = if random_bool { foo() } else { bar() };

But what about nested ones like:

fn foo(…) -> Option<bool> { … }
fn bar(…) -> Option<usize> { … }

//  The type of a is obviously `Option<bool> | Option<usize>`,
//    but is this equivalent to `Option<bool|usize>`? 
let a = if random_bool { foo() } else { bar() };

Merging nested types could solve the problem mentioned above: The type of inner is just bool | u8, and one has to match them, however, other enums with the same variant names are still an issue. I think this is similar to the x @ -128..=255-case: Just store it as different variants Option::<bool>(Option<bool>) and Option::<usize>(Option<usize>) and duplicate match-arms for all possible variants (e.g. Result<P|Q,T|U> to Ok::<P>(x), Ok::<Q>(x), Err::<T>(x), Err::<U>(x)) unless the resulting arms are actual duplicates (e.g. because the inner value was not used due to _, two Nones, …). However, that might result in quite some hidden code that feels like bloat.

References introduce another issue: lifetimes. The simplest solution would be to just allow 'static for now, so &'a str | &'b str is only allowed if 'a == 'b == 'static. Maybe this is not an issue at all, if we allow only reference per type:

fn<'a, 'b> msg(x: &'a str, y: &'b str, z: usize) -> &'b str | usize where 'a: 'b {
    match z {
        //  "Zero" is of type `&'static str`, and because `'static: '_` is always true,
        //    we do not have to add the bounds `'static: 'a` and `'static: 'b`.
        0 => "Zero",
        1 => x,
        2 => y,
        z => z,
    }
}

Unlike named enums, we might want to allow the compiler to implement a function for each configuration of argument-types, if called with values, that are not of any anonymous sum type, e.g.:

fn add(a: f32 | isize, b: f32 | isize) -> f32 | isize {
    match (a, b) {
        (isize(a), isize(b)) => a + b,
        (a, b) => f32::from(a) + f32::from(b),
    }
}

clearly returns isize if and only if both arguments are of type isize, so by duplicating this function, we would get add::<isize,isize>(a: isize, b: isize) -> isize, add::<isize,f32>(a: isize, b: f32) -> f32, … resulting in 7 functions. Optimising all of this could be quite an issue, but should be possible and should not impact the performance of the product. Maybe some hints/assertions like where a: isize && b: isize <-> $: isize might be useful, but this is something for an entirely different RFC.

I also like the previously mentioned x : T => …, this might actually be implemented less complicated, even though I am not sure if T should also allow variants, if it is a enum.

I like to apologise for this long wall of text by adding this additional text.

Yokinman commented 7 months ago

My two cents; I think the motivation for anonymous enums is broadly:

For these use cases, I think it's important that anonymous enums are easy to refactor. In the case of anonymous structs, the transformation is pretty painless. You just introduce a new tuple struct, and prefix each tuple with the struct's name:

fn split(text: &'static str, at: usize) -> (&'static str, &'static str) {
    (&text[..at], &text[at..])
}

assert_eq!(split("testing", 4), ("test", "ing"));
+ #[derive(Debug, PartialEq)]
+ struct Split(&'static str, &'static str);

! fn split(text: &'static str, at: usize) -> Split {
!     Split(&text[..at], &text[at..])
  }

! assert_eq!(split("testing", 4), Split("test", "ing"));

On the other hand, I don't think structurally anonymous enums would be as helpful for prototyping since there isn't an equivalent in explicit enums. The transformation would likely involve tedious renaming and rearranging for every pattern:

fn add_one(value: String | i64) -> String | i64 {
   match value {
       mut x: String => { 
           x.push_str("1"); 
           x 
       }
       y: i64 => {
           y + 1
       }
   }
}

fn something(value: String | i64) {
    match value {
        x: String => println!("String: {x}"),
        y: i64 => println!("i64: {y}"),
    }
}
+ #[derive(Debug, PartialEq)]
+ enum AddOne {
+     String(String),
+     i64(i64),
+ }

! fn add_one(value: AddOne) -> AddOne {
      match value {
!         AddOne::String(mut x) => { 
              x.push_str("1"); 
!             AddOne::String(x) 
          }
!         AddOne::i64(y) => {
!             AddOne::i64(y + 1)
          }
      }
  }

! fn something(value: AddOne) {
      match value {
!         AddOne::String(x) => println!("String: {x}"),
!         AddOne::i64(y) => println!("i64: {y}"),
      }
  }

Not to mention that the names would likely be changed from String and i64 (either to add camel case, to be more descriptive, or to name syntactical types like: (T,), [T; N], fn(T) -> U).

I also think that anonymous enums should be able to represent more common stateful enum types similar to std::option::Option, std::cmp::Ordering, and std::ops::Bound. Without this, I think most would end up doing a similarly awkward transformation from an anonymous struct instead:

use std::cmp::Ordering;

fn max(a: i64, b: i64) -> (i64, Ordering) {
   match a.cmp(&b) {
       Ordering::Less => (b, Ordering::Less),
       Ordering::Equal => (a, Ordering::Equal),
       Ordering::Greater => (a, Ordering::Greater),
   }
}

assert_eq!(max(4, 7), (7, Ordering::Less));
  use std::cmp::Ordering;

+ enum SomeOrdering {
+     Less(i64),
+     Equal(i64),
+     Greater(i64),
+ }

! fn max(a: i64, b: i64) -> SomeOrdering {
     match a.cmp(&b) {
!        Ordering::Less => SomeOrdering::Less(b),
!        Ordering::Equal => SomeOrdering::Equal(a),
!        Ordering::Greater => SomeOrdering::Greater(a),
      }
  }

! assert!(matches!(max(4, 7), SomeOrdering::Less(7)));

I would prefer a more general syntax where the variants are explicitly named and referenced using something like return type notation. If the anonymous enum is converted into an explicit one in the future, any code referencing its variants could still function with zero refactoring required.

use std::cmp::Ordering;

fn max(a: i64, b: i64) -> enum {
    Less(i64),
    Equal(i64),
    Greater(i64),
} {
    match a.cmp(&b) {
        Ordering::Less => max()::Less(b),
        Ordering::Equal => max()::Equal(a),
        Ordering::Greater => max()::Greater(a),
    }
}

assert!(matches!(max(4, 7), max()::Less(7)));
  use std::cmp::Ordering;

+ enum SomeOrdering {
+     Less(i64),
+     Equal(i64),
+     Greater(i64),
+ }

! fn max(a: i64, b: i64) -> SomeOrdering {
      match a.cmp(&b) {
          Ordering::Less => max()::Less(b),
          Ordering::Equal => max()::Equal(a),
          Ordering::Greater => max()::Greater(a),
      }
  }

  assert!(matches!(max(4, 7), max()::Less(7)));

Alternatively, maybe a general syntax for explicit enums with anonymous variants could be defined, although it seems niche and awkward to me.

Your friend, Yokin

ModProg commented 4 months ago

I found this issue while trying to do a struct where I wanted to contain either syn::Token![#] or syn::Token![$] which would have matched a union type, though I actually thought to make the variants + their order relevant (same as it is for tuples) to make it fully compatible with enums where two variants can contain the same value.

The syntax I imagined was


let mut value: <u8, u8, u8> = ::0(1);
value = ::1(2);
value = ::2(3);

match a {
    ::0(a) => println!("first: {a}"),
    ::1(b) | ::2(b) => println!("second or third: {b}"),
}

Using number indexes, inspired by how tuples work.

nasso commented 4 months ago

i dont like the order being relevant... addition is commutative, so different permutations of the same sum type should be equivalent (like reordering fields in a struct). i dont want to have to convert the output of a function to pass it to another if they disagree on the order

i think what most of us want is a sum type like <a, b, c> that behaves like an enum for which the variants are named after the types, not their position in the "union". writing <a, a, a> wouldn't even be possible, or would just be equivalent to <a>

for a tuple, (a, b, c) is not the same as (c, b, a) because a tuple is a struct with each field named after the position of each type/value. it allows for (a, a, a)

though your suggestion is interesting because it's a nice parallel with how tuples work, i don't think this is what we're looking for

programmerjake commented 4 months ago

I like having order matter since it fixes a hard problem: for fn f<T, U>() -> <T, U> what happens when T = U? It's also analogous to tuples in that both have their fields/variants be position-based numbers instead of being names. There's also prior art: C++'s std::variant behaves like this, where you can use std::get<N>(my_variant) to get the Nth alternative and throw an error if my_variant's current alternative isn't the Nth alternative. (you can also access by type but only if the type is unique)

teohhanhui commented 4 months ago

I like having order matter since it fixes a hard problem: for fn f<T, U>() -> <T, U> what happens when T = U?

writing <a, a, a> wouldn't even be possible, or would just be equivalent to <a>

Doesn't that address the issue?

programmerjake commented 4 months ago

I like having order matter since it fixes a hard problem: for fn f<T, U>() -> <T, U> what happens when T = U?

writing <a, a, a> wouldn't even be possible, or would just be equivalent to <a>

Doesn't that address the issue?

no, because you should be able to write <T, U> for generic T and U (since otherwise it is severely limited), and if you try to define <T, U> to somehow behave differently when you use that generic code with types where T = U then you run into unsoundness caused by the same reasons as why general specialization is unsound: it doesn't properly account for lifetimes, which allows you to write code that makes the compiler e.g. transmute a &'a u8 to &'static u8 so you can read after the memory has been freed, which is UB.

programmerjake commented 4 months ago

if you try to define <T, U> to somehow behave differently when you use that generic code with types where T = U then you run into unsoundness caused by the same reasons as why general specialization is unsound: it doesn't properly account for lifetimes, which allows you to write code that makes the compiler e.g. transmute a &'a u8 to &'static u8 so you can read after the memory has been freed, which is UB.

this is because the compiler erases all lifetime information before it generates the de-genericified code where all generics are substituted by actual types, which means that it can't tell the difference between &'a u8 and &'static u8 at that point since they both end up as &'erased u8

nasso commented 4 months ago

both are valid use cases but they really are different

you're describing C++'s std::variant (which is nice), but some of us want TypeScript's unions (but better)

i think it wouldn't be too difficult to implement std::variant when Rust eventually gets variadic generics. it would probably work like C++'s (similar to Either<L, R> but with an arbitrary number of generics)

but TypeScript-like unions, for which the order isn't relevant, would probably require compiler-level support (so that <a, b, c> is equivalent to <b, a, c> and all other permutations). i don't think it's possible in Rust today to write a generic type (with >1 parameter) for which changing the order of the type parameters doesn't change the identity of the type

i wanna be able to write a function that returns some Result<T, <E1, E2, ..., En>> and not have to worry about the order in which the errors are laid out in my error type. it's just an unordered set (bag) of errors. and ? just adds to that bag, wherever it feels like. and i can have some handle_error(e: <E1, E2, ..., En>) function. or maybe its handle_error(e: <E4, E2, E6, E8, ..., En>) etc... it just matches over the type of error, not the position in the sum type

glaebhoerl commented 4 months ago

For the record, this particular issue is explicitly about anonymous sum types, not union types. I.e. with position/tag/discriminant-based matching (like Rust enums), not type-based matching (like TS). I don't know if there's already an issue for the latter, but it might be worth opening one if people are interested in it.

teohhanhui commented 4 months ago

@glaebhoerl I don't know if Wikipedia is wrong here, but:

In type theory, a union has a sum type; this corresponds to disjoint union in mathematics.

Maybe that's why the confusion.

I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886

So I think we're actually asking for the same thing, i.e. tagged union, a.k.a. sum type.

tesaguri commented 4 months ago

I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886

The thread you quoted seems to be about Haskell-like languages, and I guess the untagged union in that context differs from what you imagine (maybe the union keyword of C and, well, Rust, which, unlike TypeScript, doesn't really have runtime type tags at all and is thus inherently unsafe?).

A union type is like a set union. It differs from sum type (corresponds to Rust's enum E { A(T), B(U) }) in that the union type of same types $T$ and $T$ equals $T$ (similar to $S \cup S = S$) while the sum type of same types $T$ and $T$ doesn't equal, nor is a supertype of $T$.

Some people in this issue have proposed the union type in this sense in addition to the sum type.

Keavon commented 4 months ago

Yeah, that Haskell-related discussion doesn't really make sense here. We already have tagged unions in Rust, they're called enums. Each tag is the variant's name. Our goal here is having untagged (or anonymous) ones so you can define T | U | V (like String | u64 | bool) and match based on each type without needing to previously declare (and in some cases import into the current scope) that tagged wrapper type (the enum type).

Diggsey commented 4 months ago

@Keavon you seem to be confusing tagged/untagged with named/anonymous, they are very different things. Rust already has untagged and tagged named unions, what it's missing are anonymous tagged unions.

"tag" refers to the enum discriminant. Untagged unions do not have a discriminant, and so are unsafe to access. See https://en.wikipedia.org/wiki/Tagged_union

The names of enum variants are not called tags - they have associated tags, for example:

enum Foo {
    A, // Might have tag 0
    B, // Might have tag 1
    C // Might have tag 2
}

Tags also exist for anonymous enums, since the compiler still needs to differentiate which variant is selected, for example:

type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */
Keavon commented 4 months ago

I see, thanks for pointing out my terminology error. If I'm reading what you explained correctly, I think you're responding to this part of my second sentence above:

Our goal here is having untagged (or anonymous) ones...

(is that right?)

Rephrasing what I wrote above, then, I think that I was describing a goal of having the compiler's type system figure out the tags behind the scenes, allowing you to write code with anonymous variant names as well as anonymous enum types. So behind the scenes, it would be tagged (using your illustrations of // Might have tag 0, etc.) but those variant names shouldn't be given by the user, and neither should the entire type be given by the user either unless the user decides to typedef it with your final code block example:

type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */

The result should be an equivalent to TypeScript's approach, however with the ability for the compiler to discriminate between the types at compile time so the code doesn't have to match based on something kind a kind: string field required in TS for its ability to work at runtime once it becomes JS. Is that roughly accurate now?

Rufflewind commented 4 months ago

The key distinction lies in this scenario:

type Foo<A, B> = A | B;

type Bar = Foo<i32, i32>; // !!

fn bar_consumer(bar: Bar) {
  match bar {
  // ??
  }
}

Option 1: Union type (like TypeScript)

In this case, Foo<A, B> flattens to a single-member union containing just i32 (or alternatively the compiler could treat it as indistinguishable to i32).

type Foo<A, B> = A | B;

type Bar = Foo<i32, i32>;

fn bar_consumer(bar: Bar) {
  match bar {
    i: i32 => ...,
    // No other options are possible.
  }
}

Option 2: Sum type (like standard Rust enums)

With a sum type, each choice in the anonymous enum must be assigned a locally unique name. Here, I chose to use the type parameter itself as the name. There are other alternatives of course, like giving them integer names analogous to tuples (.0, .1, .2, etc) but I imagine the ergonomics would be poor.

type Foo<A, B> = A | B;

type Bar = Foo<i32, i32>;  // Remains a two-member union

fn bar_consumer(bar: Bar) {
  match bar {
    A(i: i32) => ...,
    B(j: i32) => ...,
  }
}